Job Description

Role Overview

We are seeking a hands-on Testing Lead to own quality and documentation for our Deep Learning, LLM, and Vision-Language Model (VLM) products. You will define how we test, measure, document, and communicate AI quality—working closely with ML, Engineering, and Product teams in a fast-paced startup environment.

This role is ideal for someone who believes clear documentation is as critical as good testing, especially for non-deterministic AI systems.



What You’ll Do

Own Quality & Documentation End-to-End

  • Define testing strategy for LLMs, VLMs, and DL pipelines.
  • Create and maintain clear, lightweight documentation covering:
    • Model testing strategies and assumptions
    • Evaluation metrics and acceptance criteria
    • Known limitations, risks, and failure modes
    • Release readiness and quality sign-off
  • Ensure documentation evolves with models, data, and prompts.

LLM / GenAI Testing

  • Design tests for:
    • Prompt templates and prompt changes
    • RAG pipelines (retrieval quality, grounding, hallucination control)
    • Multi-turn conversations and long-context behaviour
  • Maintain golden datasets, regression test suites, and test result summaries.
  • Document prompt behaviour, edge cases, and known model quirks.

Vision & Multimodal Testing

  • Test VLMs for image-text alignment, OCR, captioning, and reasoning.
  • Document model performance across different image types, quality levels, and domains.
  • Track and publish model behaviour changes between versions.

Automation, MLOps & Reporting

  • Build Python-based automation for evaluation and regression testing.
  • Integrate tests into CI/CD and MLOps pipelines.
  • Produce readable quality reports and dashboards for engineers and leadership.
  • Monitor and document production issues such as model/data drift and degradation.

Build a Quality-First Culture

  • Establish QA and documentation standards that scale with a startup.
  • Mentor engineers on writing testable code and meaningful documentation.
  • Act as the single source of truth for AI quality, testing, and known risks.


What we’re looking For

Must-Have

  • Strong background in software testing with lead or ownership experience.
  • Hands-on experience testing LLMs, DL models, or GenAI systems.
  • Strong Python skills for test automation and data validation.
  • Proven ability to write clear, structured technical documentation.
  • Understanding of:
    • Transformer-based models and DL workflows
    • Model evaluation metrics and non-deterministic system testing
  • Comfortable working in ambiguity and moving fast in a startup.

Nice-to-Have

  • Experience with VLMs, multimodal models, or computer vision.
  • Exposure to RAG architectures, vector databases, and embeddings.
  • Familiarity with tools like LangChain, LlamaIndex, MLflow, or similar.
  • Experience documenting AI risks, limitations, or compliance requirements.

 Interested can apply to



Apply for this Position

Ready to join ? Click the button below to submit your application.

Submit Application