Job Description

Job Title: Senior Generative AI Engineer (Drafting & RAG Systems)
Role Overview
We are looking for a Senior Generative AI Engineer to lead the development and deployment of our next-generation Automated Drafting Tool. You will be responsible for the entire lifecycle of the AI features—from local prototyping using Ollama to scaling globally via Open AI APIs.
The ideal candidate has a "Full-Stack AI" mindset: you understand how to retrieve context using RAG, manage high-dimensional data in Vector Databases, and ensure the final drafted output is coherent, accurate, and contextually aware.
Key Responsibilities
1. AI Architecture & Drafting Logic
- Design and implement end-to-end Retrieval-Augmented Generation (RAG) pipelines specifically optimized for document drafting.
- Develop advanced Prompt Engineering strategies to handle complex drafting constraints (tone, legal/technical compliance, and formatting).
- Implement hybrid model strategies, utilizing Ollama for local development, testing, and privacy-sensitive tasks, while orchestrating Open AI (GPT-4o/o1) for production-level reasoning.
2. Data & Vector Engineering
- Build and maintain scalable Vector Databases (e.g., Pinecone, Weaviate, Milvus, or FAISS).
- Optimize document ingestion pipelines: chunking strategies, embedding model selection, and metadata filtering to improve retrieval precision.
- Implement "Agentic RAG" where the system can self-correct or multi-step reason through a draft.
3. Deployment & MLOps (Local to Cloud)
- Bridge the gap between local ideation (running models on Ollama/Local GPUs) and cloud production environments.
- Deploy AI services using containerization (Docker/Kubernetes) and manage API latency, rate limits, and token costs.
- Establish monitoring for AI performance, including hallucination detection and "groundedness" metrics.
Required Skills & Qualifications
Mandatory Experience
- Experience: 3+ years of professional experience in AI/Machine Learning or Backend Engineering with a heavy Gen AI focus.
- LLM Orchestration: Deep hands-on experience with Lang Chain or Llama Index.
- Model Proficiency: Expert knowledge of the Open AI API ecosystem and local model runners like Ollama.
- Vector Expertise: Proven track record of implementing and optimizing Vector Databases and RAG workflows.
- Programming: Mastery of Python (Fast API/Flask) and asynchronous programming
- JIRA + Confluence exposure is must have
Technical Stack
- Models: Open AI (GPT-4), Ollama (Llama 3, Mistral, Mixtral).
- Tools: Lang Chain, Llama Index, Lang Smith (for tracing).
- Database: Pinecone, Chroma DB, or pgvector.
- Infrastructure: Docker, AWS/GCP/Azure, Git Hub Actions for CI/CD.
What We Look For (The "Hacker" Mindset)
- Production Proven: You have moved at least one Gen AI product from a Jupyter Notebook/Local Script to a live environment with real users.
- Problem Solver: You know how to handle the "stochastic" nature of LLMs and can build guardrails to prevent hallucinations in drafting.
- Architecture First: You care about token optimization and latency just as much as you care about the quality of the text generated.

Apply for this Position

Ready to join ? Click the button below to submit your application.

Submit Application