d) LLM Agent Evaluation & Benchmarking

Agile Robots SE

📍 Munich, Bavaria, Germany

part_time Computer Occupations Posted June 05, 2026

Apply Now Similar Jobs

Job Description

About the role
We are looking for a Working Student (m/f/d) LLM Agent Evaluation & Benchmarking. In this role, you will design and build an agent-agnostic benchmarking harness, run comparative evaluations across frontier and local models, and translate findings into prompt, guard, and tool-schema improvements.

Your Responsibilities
Harness Development: Design and build an agent-agnostic benchmarking harness that executes versioned task suites against frontier and local models with reproducible, version-controlled runs.
Task Suite Design: Define and maintain evaluation task suites that measure task success, grounding accuracy, latency, and cost across the agent portfolio.
Model Evaluation: Run period...
                    

Apply for this Position

Ready to join Agile Robots SE? Click the button below to submit your application.

Submit Application

Job Details

Location

Munich, Bavaria, Germany

Job Type

part_time