Job Description

๐๐จ๐ฌ๐ข๐ญ๐ข๐จ๐ง: AI Data Engineer

๐„๐ฆ๐ฉ๐ฅ๐จ๐ฒ๐ฆ๐ž๐ง๐ญ ๐“๐ฒ๐ฉ๐ž: Full-Time with Cogent IBS

๐–๐จ๐ซ๐ค ๐Œ๐จ๐๐ž: Remote

๐๐ž๐ž๐ ๐ซ๐ž๐ฌ๐จ๐ฎ๐ซ๐œ๐ž๐ฌ ๐Ÿ๐ซ๐จ๐ฆ ๐ˆ๐ง๐๐ข๐š (primarily) ๐š๐ง๐ ๐‹๐€๐“๐€๐Œ ๐œ๐จ๐ฎ๐ง๐ญ๐ซ๐ข๐ž๐ฌ (secondary)


๐‘๐ž๐ฌ๐ฉ๐จ๐ง๐ฌ๐ข๐›๐ข๐ฅ๐ข๐ญ๐ข๐ž๐ฌ:


- Design and implement the Agentic RAG architecture for large-scale data standardization.


- Build scalable ETL pipelines using PySpark and Databricks for processing 1.2B+ records.


- Design, develop, and optimize Databricks jobs, workflows, and clusters.


- Leverage Delta Lake for reliable, scalable data storage and processing.


- Integrate vector databases, LLM orchestration, and external catalog APIs (MTP, PCdb, VCdb).


- Implement confidence scoring, retry logic, and human-in-the-loop workflows.


- Optimize performance, cost, and scalability across distributed pipelines.


- Ensure auditability, lineage tracking, and data governance compliance.


๐‘๐ž๐ช๐ฎ๐ข๐ซ๐ž๐ฆ๐ž๐ง๐ญ๐ฌ/๐’๐ค๐ข๐ฅ๐ฅ๐ฌ:


- Strong Python, PySpark, and SQL expertise.


- Hands-on Databricks experience (notebooks, workflows, jobs, Delta Lake).


- Experience with large-scale ETL and distributed data processing.


- Working knowledge of LLMs, RAG, embeddings, and vector databases.


- Cloud experience on AWS, Azure, or GCP.


- Experience integrating REST APIs and external data catalogs.


- Familiarity with data governance, monitoring, and logging frameworks.


Interested candidates, please send your resume to ๐’‰๐’†๐’๐’๐’@๐’„๐’๐’ˆ๐’†๐’๐’•๐’Š๐’ƒ๐’”.๐’„๐’๐’Ž and mention โ€œ๐€๐ˆ ๐ƒ๐š๐ญ๐š ๐„๐ง๐ ๐ข๐ง๐ž๐ž๐ซโ€ in the subject line.

Apply for this Position

Ready to join ? Click the button below to submit your application.

Submit Application