Job Description
Role Overview
We are seeking a hands-on Testing Lead to own quality and documentation for our Deep Learning, LLM, and Vision-Language Model (VLM) products. You will define how we test, measure, document, and communicate AI quality—working closely with ML, Engineering, and Product teams in a fast-paced startup environment.
This role is ideal for someone who believes clear documentation is as critical as good testing, especially for non-deterministic AI systems.
What You’ll Do
Own Quality & Documentation End-to-End
- Define testing strategy for LLMs, VLMs, and DL pipelines.
- Create and maintain clear, lightweight documentation covering:
- Model testing strategies and assumptions
- Evaluation metrics and acceptance criteria
- Known limitations, risks, and failure modes
- Release readiness and quality sign-off
- Ensure documentation evolves with models, data, and prompts.
LLM / GenAI Testing
- Design tests for:
- Prompt templates and prompt changes
- RAG pipelines (retrieval quality, grounding, hallucination control)
- Multi-turn conversations and long-context behaviour
- Maintain golden datasets, regression test suites, and test result summaries.
- Document prompt behaviour, edge cases, and known model quirks.
Vision & Multimodal Testing
- Test VLMs for image-text alignment, OCR, captioning, and reasoning.
- Document model performance across different image types, quality levels, and domains.
- Track and publish model behaviour changes between versions.
Automation, MLOps & Reporting
- Build Python-based automation for evaluation and regression testing.
- Integrate tests into CI/CD and MLOps pipelines.
- Produce readable quality reports and dashboards for engineers and leadership.
- Monitor and document production issues such as model/data drift and degradation.
Build a Quality-First Culture
- Establish QA and documentation standards that scale with a startup.
- Mentor engineers on writing testable code and meaningful documentation.
- Act as the single source of truth for AI quality, testing, and known risks.
What we’re looking For
Must-Have
- Strong background in software testing with lead or ownership experience.
- Hands-on experience testing LLMs, DL models, or GenAI systems.
- Strong Python skills for test automation and data validation.
- Proven ability to write clear, structured technical documentation.
- Understanding of:
- Transformer-based models and DL workflows
- Model evaluation metrics and non-deterministic system testing
- Comfortable working in ambiguity and moving fast in a startup.
Nice-to-Have
- Experience with VLMs, multimodal models, or computer vision.
- Exposure to RAG architectures, vector databases, and embeddings.
- Familiarity with tools like LangChain, LlamaIndex, MLflow, or similar.
- Experience documenting AI risks, limitations, or compliance requirements.
Interested can apply to
Apply for this Position
Ready to join ? Click the button below to submit your application.
Submit Application