Job Description

About the Role
As a Data Engineer II at Baazi, you will contribute to building and optimizing scalable data pipelines and lakehouse systems that power analytics and product insights across the organization. You’ll work hands-on across our AWS-based ecosystem, develop robust data workflows, and ensure high standards of data quality, performance, and reliability.
Key Responsibilities
Build and optimize scalable data pipelines and lakehouse components using

Iceberg or Hudi .
Develop ETL/ELT workflows on AWS using

Glue, EMR, Lambda, Redshift , and other platform services.
Write clean, modular, reusable code using

PySpark, Python, and SQL .
Manage and enhance orchestration workflows with

Airflow

to ensure reliability and scalability.
Collaborate with analytics, product, and engineering teams to maintain unified and consistent data models.
Participate in

performance tuning, cost optimization , and improvement of AWS data infrastructure.
Implement and follow best practices in

data quality, cataloging, and metadata management .
Contribute to code reviews and engineering discussions to maintain high technical standards.
Required Skills & Experience
2–4 years

of experience in data engineering with strong exposure to large-scale data systems.
2+ years of hands-on experience in PySpark .
Solid understanding of the

AWS data ecosystem : Glue, EMR, S3, Lambda, Redshift, CloudWatch.
Practical experience working with

Apache Iceberg or Hudi

(Iceberg preferred).
Strong programming skills in

Python, PySpark , and solid command over

SQL .
Experience working with

Airflow

for scheduling and orchestration.
Understanding of distributed systems, data modeling, and data governance principles.
Exposure to

containerized environments (Kubernetes, Docker)

is a plus.
Ability to work closely with business and technical teams to deliver scalable solutions.

Apply for this Position

Ready to join ? Click the button below to submit your application.

Submit Application