Job Description
๐๐จ๐ฌ๐ข๐ญ๐ข๐จ๐ง: AI Data Engineer
๐๐ฆ๐ฉ๐ฅ๐จ๐ฒ๐ฆ๐๐ง๐ญ ๐๐ฒ๐ฉ๐: Full-Time with Cogent IBS
๐๐จ๐ซ๐ค ๐๐จ๐๐: Remote
๐๐๐๐ ๐ซ๐๐ฌ๐จ๐ฎ๐ซ๐๐๐ฌ ๐๐ซ๐จ๐ฆ ๐๐ง๐๐ข๐ (primarily) ๐๐ง๐ ๐๐๐๐๐ ๐๐จ๐ฎ๐ง๐ญ๐ซ๐ข๐๐ฌ (secondary)
๐๐๐ฌ๐ฉ๐จ๐ง๐ฌ๐ข๐๐ข๐ฅ๐ข๐ญ๐ข๐๐ฌ:
- Design and implement the Agentic RAG architecture for large-scale data standardization.
- Build scalable ETL pipelines using PySpark and Databricks for processing 1.2B+ records.
- Design, develop, and optimize Databricks jobs, workflows, and clusters.
- Leverage Delta Lake for reliable, scalable data storage and processing.
- Integrate vector databases, LLM orchestration, and external catalog APIs (MTP, PCdb, VCdb).
- Implement confidence scoring, retry logic, and human-in-the-loop workflows.
- Optimize performance, cost, and scalability across distributed pipelines.
- Ensure auditability, lineage tracking, and data governance compliance.
๐๐๐ช๐ฎ๐ข๐ซ๐๐ฆ๐๐ง๐ญ๐ฌ/๐๐ค๐ข๐ฅ๐ฅ๐ฌ:
- Strong Python, PySpark, and SQL expertise.
- Hands-on Databricks experience (notebooks, workflows, jobs, Delta Lake).
- Experience with large-scale ETL and distributed data processing.
- Working knowledge of LLMs, RAG, embeddings, and vector databases.
- Cloud experience on AWS, Azure, or GCP.
- Experience integrating REST APIs and external data catalogs.
- Familiarity with data governance, monitoring, and logging frameworks.
Interested candidates, please send your resume to ๐๐๐๐๐@๐๐๐๐๐๐๐๐๐.๐๐๐ and mention โ๐๐ ๐๐๐ญ๐ ๐๐ง๐ ๐ข๐ง๐๐๐ซโ in the subject line.
Apply for this Position
Ready to join ? Click the button below to submit your application.
Submit Application