Job Description
Job Title: Senior Generative AI Engineer (Drafting & RAG Systems)
Role Overview
We are looking for a Senior Generative AI Engineer to lead the development and deployment of our next-generation Automated Drafting Tool. You will be responsible for the entire lifecycle of the AI features—from local prototyping using Ollama to scaling globally via OpenAI APIs.
The ideal candidate has a "Full-Stack AI" mindset: you understand how to retrieve context using RAG, manage high-dimensional data in Vector Databases, and ensure the final drafted output is coherent, accurate, and contextually aware.
Key Responsibilities
1. AI Architecture & Drafting Logic
- Design and implement end-to-end Retrieval-Augmented Generation (RAG) pipelines specifically optimized for document drafting.
- Develop advanced Prompt Engineering strategies to handle complex drafting constraints (tone, legal/technical compliance, and formatting).
- Implement hybrid model strategies, utilizing Ollama for local development, testing, and privacy-sensitive tasks, while orchestrating OpenAI (GPT-4o/o1) for production-level reasoning.
2. Data & Vector Engineering
- Build and maintain scalable Vector Databases (e.G., Pinecone, Weaviate, Milvus, or FAISS).
- Optimize document ingestion pipelines: chunking strategies, embedding model selection, and metadata filtering to improve retrieval precision.
- Implement "Agentic RAG" where the system can self-correct or multi-step reason through a draft.
3. Deployment & MLOps (Local to Cloud)
- Bridge the gap between local ideation (running models on Ollama/Local GPUs) and cloud production environments.
- Deploy AI services using containerization (Docker/Kubernetes) and manage API latency, rate limits, and token costs.
- Establish monitoring for AI performance, including hallucination detection and "groundedness" metrics.
Required Skills & Qualifications
Mandatory Experience
- Experience: 3+ years of professional experience in AI/Machine Learning or Backend Engineering with a heavy GenAI focus.
- LLM Orchestration: Deep hands-on experience with LangChain or LlamaIndex.
- Model Proficiency: Expert knowledge of the OpenAI API ecosystem and local model runners like Ollama.
- Vector Expertise: Proven track record of implementing and optimizing Vector Databases and RAG workflows.
- Programming: Mastery of Python (FastAPI/Flask) and asynchronous programming
- JIRA + Confluence exposure is must have
Technical Stack
- Models: OpenAI (GPT-4), Ollama (Llama 3, Mistral, Mixtral).
- Tools: LangChain, LlamaIndex, LangSmith (for tracing).
- Database: Pinecone, ChromaDB, or pgvector.
- Infrastructure: Docker, AWS/GCP/Azure, GitHub Actions for CI/CD.
What We Look For (The "Hacker" Mindset)
- Production Proven: You have moved at least one GenAI product from a Jupyter Notebook/Local Script to a live environment with real users.
- Problem Solver: You know how to handle the "stochastic" nature of LLMs and can build guardrails to prevent hallucinations in drafting.
- Architecture First: You care about token optimization and latency just as much as you care about the quality of the text generated.
Apply for this Position
Ready to join ? Click the button below to submit your application.
Submit Application