Job Description
Data Engineer / ML Engineer — Job Description
Location - Gurugram (Onsite)
Salary Budget - Upto 20 LPA
Key Responsibilities
- Design, build, and maintain scalable data pipelines (batch + streaming) using Spark , Hadoop , and other Apache ecosystem tools.
- Develop robust ETL workflows for large-scale data ingestion, transformation, and validation.
- Work with Cassandra , Data Lakes , and distributed storage systems to handle large-volume datasets.
- Write clean, optimized, and modular Python code for data processing, automation, and machine learning tasks.
- Utilize Linux environments for scripting, performance tuning, and data workflow orchestration.
- Build and manage web scraping pipelines to extract structured and unstructured data from diverse sources.
- Collaborate with ML/AI teams to prepare training datasets , manage feature stores, and support model lifecycle.
- Implement and experiment with LLMs , LangChain , RAG pipelines , and vector database integrations.
- Assist in fine-tuning models , evaluating model performance, and deploying ML models into production.
- Optimize data workflows for performance, scalability, and fault tolerance.
- Document data flows, transformation logic, and machine learning processes.
- Work cross-functionally with engineering, product, and DevOps teams to ensure reliable, production-grade data systems.
Requirements
- 3–6 years of experience as a Data Engineer , ML Engineer , or similar role.
- Strong expertise in Advanced Python (data structures, multiprocessing, async, clean architecture).
- Solid experience with:
- Apache Spark / PySpark
- Hadoop ecosystem (HDFS, Hive, Yarn, HBase, etc.)
- Cassandra or similar distributed databases
- Linux (CLI tools, shell scripting, environment management)
- Proven ability to design and implement ETL pipelines and scalable data processing systems.
- Hands-on experience with data lakes , large-scale storage, and distributed systems.
- Experience with web scraping frameworks (BeautifulSoup, Scrapy, Playwright, etc.).
- Familiarity with LangChain , LLMs , RAG , vector stores (FAISS, Pinecone, Milvus), and ML workflow tools.
- Understanding of model training, fine-tuning, and evaluation workflows.
- Strong problem-solving skills, ability to deep dive into complex data issues, and write production-ready code.
- Experience with cloud environments (AWS/GCP/Azure) is a plus.
Apply for this Position
Ready to join ? Click the button below to submit your application.
Submit Application