Job Description

๐๐จ๐ฌ๐ข๐ญ๐ข๐จ๐ง: AI Data Engineer

๐„๐ฆ๐ฉ๐ฅ๐จ๐ฒ๐ฆ๐ž๐ง๐ญ ๐“๐ฒ๐ฉ๐ž: Full-Time with Cogent IBS

๐–๐จ๐ซ๐ค ๐Œ๐จ๐๐ž: Remote

๐๐ž๐ž๐ ๐ซ๐ž๐ฌ๐จ๐ฎ๐ซ๐œ๐ž๐ฌ ๐Ÿ๐ซ๐จ๐ฆ ๐ˆ๐ง๐๐ข๐š (primarily) ๐š๐ง๐ ๐‹๐€๐“๐€๐Œ ๐œ๐จ๐ฎ๐ง๐ญ๐ซ๐ข๐ž๐ฌ (secondary)


๐‘๐ž๐ฌ๐ฉ๐จ๐ง๐ฌ๐ข๐›๐ข๐ฅ๐ข๐ญ๐ข๐ž๐ฌ:

โ€ข Design and implement the Agentic RAG architecture for large-scale data standardization.

โ€ข Build scalable ETL pipelines using PySpark and Databricks for processing 1.2B+ records.

โ€ข Design, develop, and optimize Databricks jobs, workflows, and clusters.

โ€ข Leverage Delta Lake for reliable, scalable data storage and processing.

โ€ข Integrate vector databases, LLM orchestration, and external catalog APIs (MTP, PCdb, VCdb).

โ€ข Implement confidence scoring, retry logic, and human-in-the-loop workflows.

โ€ข Optimize performance, cost, and scalability across distributed pipelines.

โ€ข Ensure auditability, lineage tracking, and data governance compliance.


๐‘๐ž๐ช๐ฎ๐ข๐ซ๐ž๐ฆ๐ž๐ง๐ญ๐ฌ/๐’๐ค๐ข๐ฅ๐ฅ๐ฌ:

โ€ข Strong Python, PySpark, and SQL expertise.

โ€ข Hands-on Databricks experience (notebooks, workflows, jobs, Delta Lake).

โ€ข Experience with large-scale ETL and distributed data processing.

โ€ข Working knowledge of LLMs, RAG, embeddings, and vector databases.

โ€ข Cloud experience on AWS, Azure, or GCP.

โ€ข Experience integrating REST APIs and external data catalogs.

โ€ข Familiarity with data governance, monitoring, and logging frameworks.


Interested candidates, please send your resume to ๐’‰๐’†๐’๐’๐’@๐’„๐’๐’ˆ๐’†๐’๐’•๐’Š๐’ƒ๐’”.๐’„๐’๐’Ž and mention โ€œ๐€๐ˆ ๐ƒ๐š๐ญ๐š ๐„๐ง๐ ๐ข๐ง๐ž๐ž๐ซโ€ in the subject line.

Apply for this Position

Ready to join ? Click the button below to submit your application.

Submit Application