Job Description
About the Role
As a Data Engineer II at Baazi, you will contribute to building and optimizing scalable data pipelines and lakehouse systems that power analytics and product insights across the organization. You’ll work hands-on across our AWS-based ecosystem, develop robust data workflows, and ensure high standards of data quality, performance, and reliability.
Key Responsibilities
Build and optimize scalable data pipelines and lakehouse components using
Iceberg or Hudi .
Develop ETL/ELT workflows on AWS using
Glue, EMR, Lambda, Redshift , and other platform services.
Write clean, modular, reusable code using
PySpark, Python, and SQL .
Manage and enhance orchestration workflows with
Airflow
to ensure reliability and scalability.
Collaborate with analytics, product, and engineering teams to maintain unified and consistent data models.
Participate in
performance tuning, cost optimization , and improvement of AWS data infrastructure.
Implement and follow best practices in
data quality, cataloging, and metadata management .
Contribute to code reviews and engineering discussions to maintain high technical standards.
Required Skills & Experience
2–4 years
of experience in data engineering with strong exposure to large-scale data systems.
2+ years of hands-on experience in PySpark .
Solid understanding of the
AWS data ecosystem : Glue, EMR, S3, Lambda, Redshift, CloudWatch.
Practical experience working with
Apache Iceberg or Hudi
(Iceberg preferred).
Strong programming skills in
Python, PySpark , and solid command over
SQL .
Experience working with
Airflow
for scheduling and orchestration.
Understanding of distributed systems, data modeling, and data governance principles.
Exposure to
containerized environments (Kubernetes, Docker)
is a plus.
Ability to work closely with business and technical teams to deliver scalable solutions.
As a Data Engineer II at Baazi, you will contribute to building and optimizing scalable data pipelines and lakehouse systems that power analytics and product insights across the organization. You’ll work hands-on across our AWS-based ecosystem, develop robust data workflows, and ensure high standards of data quality, performance, and reliability.
Key Responsibilities
Build and optimize scalable data pipelines and lakehouse components using
Iceberg or Hudi .
Develop ETL/ELT workflows on AWS using
Glue, EMR, Lambda, Redshift , and other platform services.
Write clean, modular, reusable code using
PySpark, Python, and SQL .
Manage and enhance orchestration workflows with
Airflow
to ensure reliability and scalability.
Collaborate with analytics, product, and engineering teams to maintain unified and consistent data models.
Participate in
performance tuning, cost optimization , and improvement of AWS data infrastructure.
Implement and follow best practices in
data quality, cataloging, and metadata management .
Contribute to code reviews and engineering discussions to maintain high technical standards.
Required Skills & Experience
2–4 years
of experience in data engineering with strong exposure to large-scale data systems.
2+ years of hands-on experience in PySpark .
Solid understanding of the
AWS data ecosystem : Glue, EMR, S3, Lambda, Redshift, CloudWatch.
Practical experience working with
Apache Iceberg or Hudi
(Iceberg preferred).
Strong programming skills in
Python, PySpark , and solid command over
SQL .
Experience working with
Airflow
for scheduling and orchestration.
Understanding of distributed systems, data modeling, and data governance principles.
Exposure to
containerized environments (Kubernetes, Docker)
is a plus.
Ability to work closely with business and technical teams to deliver scalable solutions.
Apply for this Position
Ready to join ? Click the button below to submit your application.
Submit Application