Job Description
๐๐จ๐ฌ๐ข๐ญ๐ข๐จ๐ง: AI Data Engineer
๐๐ฆ๐ฉ๐ฅ๐จ๐ฒ๐ฆ๐๐ง๐ญ ๐๐ฒ๐ฉ๐: Full-Time with Cogent IBS
๐๐จ๐ซ๐ค ๐๐จ๐๐: Remote
๐๐๐๐ ๐ซ๐๐ฌ๐จ๐ฎ๐ซ๐๐๐ฌ ๐๐ซ๐จ๐ฆ ๐๐ง๐๐ข๐ (primarily) ๐๐ง๐ ๐๐๐๐๐ ๐๐จ๐ฎ๐ง๐ญ๐ซ๐ข๐๐ฌ (secondary)
๐๐๐ฌ๐ฉ๐จ๐ง๐ฌ๐ข๐๐ข๐ฅ๐ข๐ญ๐ข๐๐ฌ:
โข Design and implement the Agentic RAG architecture for large-scale data standardization.
โข Build scalable ETL pipelines using PySpark and Databricks for processing 1.2B+ records.
โข Design, develop, and optimize Databricks jobs, workflows, and clusters.
โข Leverage Delta Lake for reliable, scalable data storage and processing.
โข Integrate vector databases, LLM orchestration, and external catalog APIs (MTP, PCdb, VCdb).
โข Implement confidence scoring, retry logic, and human-in-the-loop workflows.
โข Optimize performance, cost, and scalability across distributed pipelines.
โข Ensure auditability, lineage tracking, and data governance compliance.
๐๐๐ช๐ฎ๐ข๐ซ๐๐ฆ๐๐ง๐ญ๐ฌ/๐๐ค๐ข๐ฅ๐ฅ๐ฌ:
โข Strong Python, PySpark, and SQL expertise.
โข Hands-on Databricks experience (notebooks, workflows, jobs, Delta Lake).
โข Experience with large-scale ETL and distributed data processing.
โข Working knowledge of LLMs, RAG, embeddings, and vector databases.
โข Cloud experience on AWS, Azure, or GCP.
โข Experience integrating REST APIs and external data catalogs.
โข Familiarity with data governance, monitoring, and logging frameworks.
Interested candidates, please send your resume to ๐๐๐๐๐@๐๐๐๐๐๐๐๐๐.๐๐๐ and mention โ๐๐ ๐๐๐ญ๐ ๐๐ง๐ ๐ข๐ง๐๐๐ซโ in the subject line.
Apply for this Position
Ready to join ? Click the button below to submit your application.
Submit Application