Job Description
Data Engineer / ML Engineer — Job Description
Location - Gurugram (Onsite)
Salary Budget - Upto 20 LPA
Key Responsibilities
Design, build, and maintain scalable data pipelines (batch + streaming) using Spark , Hadoop , and other Apache ecosystem tools.
Develop robust ETL workflows for large-scale data ingestion, transformation, and validation.
Work with Cassandra , Data Lakes , and distributed storage systems to handle large-volume datasets.
Write clean, optimized, and modular Python code for data processing, automation, and machine learning tasks.
Utilize Linux environments for scripting, performance tuning, and data workflow orchestration.
Build and manage web scraping pipelines to extract structured and unstructured data from diverse sources.
Collaborate with ML/AI teams to prepare training datasets , manage feature stores, and support model lifecycle.
Implement and experiment with LLMs , LangChain , RAG pipelines , and vector database integrations.
Assist in fine-tuning models , evaluating model performance, and deploying ML models into production.
Optimize data workflows for performance, scalability, and fault tolerance.
Document data flows, transformation logic, and machine learning processes.
Work cross-functionally with engineering, product, and DevOps teams to ensure reliable, production-grade data systems.
Requirements
3–6 years of experience as a Data Engineer , ML Engineer , or similar role.
Strong expertise in Advanced Python (data structures, multiprocessing, async, clean architecture).
Solid experience with:
Apache Spark / PySpark
Hadoop ecosystem (HDFS, Hive, Yarn, HBase, etc.)
Cassandra or similar distributed databases
Linux (CLI tools, shell scripting, environment management)
Proven ability to design and implement ETL pipelines and scalable data processing systems.
Hands-on experience with data lakes , large-scale storage, and distributed systems.
Experience with web scraping frameworks (BeautifulSoup, Scrapy, Playwright, etc.).
Familiarity with LangChain , LLMs , RAG , vector stores (FAISS, Pinecone, Milvus), and ML workflow tools.
Understanding of model training, fine-tuning, and evaluation workflows.
Strong problem-solving skills, ability to deep dive into complex data issues, and write production-ready code.
Experience with cloud environments (AWS/GCP/Azure) is a plus.
Location - Gurugram (Onsite)
Salary Budget - Upto 20 LPA
Key Responsibilities
Design, build, and maintain scalable data pipelines (batch + streaming) using Spark , Hadoop , and other Apache ecosystem tools.
Develop robust ETL workflows for large-scale data ingestion, transformation, and validation.
Work with Cassandra , Data Lakes , and distributed storage systems to handle large-volume datasets.
Write clean, optimized, and modular Python code for data processing, automation, and machine learning tasks.
Utilize Linux environments for scripting, performance tuning, and data workflow orchestration.
Build and manage web scraping pipelines to extract structured and unstructured data from diverse sources.
Collaborate with ML/AI teams to prepare training datasets , manage feature stores, and support model lifecycle.
Implement and experiment with LLMs , LangChain , RAG pipelines , and vector database integrations.
Assist in fine-tuning models , evaluating model performance, and deploying ML models into production.
Optimize data workflows for performance, scalability, and fault tolerance.
Document data flows, transformation logic, and machine learning processes.
Work cross-functionally with engineering, product, and DevOps teams to ensure reliable, production-grade data systems.
Requirements
3–6 years of experience as a Data Engineer , ML Engineer , or similar role.
Strong expertise in Advanced Python (data structures, multiprocessing, async, clean architecture).
Solid experience with:
Apache Spark / PySpark
Hadoop ecosystem (HDFS, Hive, Yarn, HBase, etc.)
Cassandra or similar distributed databases
Linux (CLI tools, shell scripting, environment management)
Proven ability to design and implement ETL pipelines and scalable data processing systems.
Hands-on experience with data lakes , large-scale storage, and distributed systems.
Experience with web scraping frameworks (BeautifulSoup, Scrapy, Playwright, etc.).
Familiarity with LangChain , LLMs , RAG , vector stores (FAISS, Pinecone, Milvus), and ML workflow tools.
Understanding of model training, fine-tuning, and evaluation workflows.
Strong problem-solving skills, ability to deep dive into complex data issues, and write production-ready code.
Experience with cloud environments (AWS/GCP/Azure) is a plus.
Apply for this Position
Ready to join ? Click the button below to submit your application.
Submit Application