Job Description

Data Scientist

Design and implement end-to-end evaluation frameworks to assess performance, reliability, and safety of multi-agent AI systems Lead experimentation and A/B testing efforts to systematically test hypotheses, validate model improvements, and track performance across agent iterations Curate and maintain high-quality ground truth datasets to enable accurate, reproducible evaluation of multi-agent outputs Identify and address reliability and accuracy gaps across agent workflows, failure modes, and edge cases in production-like environments Stay current on emerging research in agentic AI, LLM evaluation, and multi-agent coordination to continuously improve framework design

Technical Skills

Proficiency in Python and ML frameworks Hands-on experience with LLM APIs and agentic frameworks (Lang Chain, Llama Index, Semetic Kernal I) Familiarity with evaluation tooling (Ragas, Deep Eval, Lang Smith, or similar) Experience with data pipelines, experiment tracking (M...

Apply for this Position

Ready to join Giggso? Click the button below to submit your application.

Submit Application