Job Description

𝐄𝐦𝐩𝐥𝐨𝐲𝐦𝐞𝐧𝐭 𝐓𝐲𝐩𝐞: Full-Time with Cogent IBS<br />
𝐖𝐨𝐫𝐤 𝐌𝐨𝐝𝐞: Remote<br />
𝐍𝐞𝐞𝐝 𝐫𝐞𝐬𝐨𝐮𝐫𝐜𝐞𝐬 𝐟𝐫𝐨𝐦 𝐈𝐧𝐝𝐢𝐚 𝐚𝐧𝐝 𝐋𝐀𝐓𝐀𝐌 𝐜𝐨𝐮𝐧𝐭𝐫𝐢𝐞𝐬<br />
<br />
𝐑𝐞𝐬𝐩𝐨𝐧𝐬𝐢𝐛𝐢𝐥𝐢𝐭𝐢𝐞𝐬:<br />
• Design and implement the Agentic RAG architecture for large-scale data standardization.<br />
• Build scalable ETL pipelines using PySpark and Databricks for processing 1.2B+ records.<br />
• Design, develop, and optimize Databricks jobs, workflows, and clusters.<br />
• Leverage Delta Lake for reliable, scalable data storage and processing.<br />
• Integrate vector databases, LLM orchestration, and external catalog APIs (MTP, PCdb, VCdb).<br />
• Implement confidence scoring, retry logic, and human-in-the-loop workflows.<br />
• Optimize performance, cost, and scalability across distributed pipelines.<br />
• Ensure auditability, lineage tracking, and data governance compliance.<br />
<br />
𝐑𝐞𝐪𝐮𝐢𝐫𝐞𝐦𝐞𝐧𝐭𝐬/𝐒𝐤𝐢𝐥𝐥𝐬:<br />
• Strong Python, PySpark, and SQL expertise.<br />
• Hands-on Databricks experience (notebooks, workflows, jobs, Delta Lake).<br />
• Experience with large-scale ETL and distributed data processing.<br />
• Working knowledge of LLMs, RAG, embeddings, and vector databases.<br />
• Cloud experience on AWS, Azure, or GCP.<br />
• Experience integrating REST APIs and external data catalogs.<br />
• Familiarity with data governance, monitoring, and logging frameworks.

Apply for this Position

Ready to join ? Click the button below to submit your application.

Submit Application