Job Description
Job Title: AI Agent Evaluation Engineer
Experience Level: 5-7 Years (6+ years in Software QA required)
Work Split: 70% Automation / 30% Manual Testing
Key Focus: Responsible AI, Safety Evals, and Google ADK
AI/LLM Testing Experience: Minimum 2 years focused specifically on testing/evaluating AI systems, conversational agents, or LLMs.
Safety & Red Teaming: Direct experience in \"Safety Evals\" is mandatory. This includes red teaming, adversarial testing, jailbreaking, and measuring toxicity/bias.
Google ADK Knowledge: Must have direct experience or a strong conceptual understanding of the Google Agent Development Kit (ADK) and Vertex AI.
Technical Stack:
o Strong proficiency in Python for scripting and automation.
o Experience with PyTest .
o Prompt Injection experience.
Tooling Familiarity: Experience with libraries such as Langsmith, DeepEval, Ragas, Giskard, or Hugging Face.
Responsibilities:
- Develop synthetic testing environments and simulation strategies to stress-test agents under various real-world conditions.
- Design, implement, and maintain scalable and repeatable evaluation datasets and metrics to test agent performance, robustness, safety, and alignment.
- Develop and execute adversarial testing, jailbreaking, and red-teaming methodologies to identify potential harm, bias, toxicity, and unauthorized behavior in agent responses.
- Implement and measure adherence to established ethical guidelines, safety policies, and content filtering mechanisms.
- Define comprehensive QA strategies, including functional, integration, regression, and user acceptance testing for conversational and goal-oriented AI agents.
- Develop and execute detailed Test artefacts such as test plans, test cases, test scenarios for agent features, tool use, memory, and reasoning capabilities.
- Identify, document, prioritize, and track bugs using Jira, performance degradations, and alignment failures in agent behavior.
- Collaborate closely with AI/ML Engineers and Researchers to analyze root causes and validate fixes.
- Integrate evaluation pipelines into the CI/CD process to enable continuous quality assurance and fast iteration cycles.
- Analyze and interpret evaluation results, providing clear, actionable insights and quality reports to stakeholders and development teams.
If your skills align with the job description, we kindly request you to fill out the form.
Link:
Apply for this Position
Ready to join ? Click the button below to submit your application.
Submit Application