Job Description
Technical Skills Required:
At least Intermediate level in AWS ETL (Glue, lambda, Batch, EMR) & API Development(Fast API + Core python), Redshift
AWS Redshift:
Worked on creating provisioned clusters
Expert knowledge of Workload management
Materialized View, DDL Optimizations, Json handling
AWS Glue to Redshift Loads
Row & Column Level Security
AWS Glue (ETL, Jobs, Crawlers, Catalogue including Iceberg, Workflows, Dynamic Data Frames)
Worked on Batch & Stream workloads
Performance Tuning & Cost Optimization
IAM, KMS, Secret Manager and Fine-Grained Access Control (Encryption good to have)
Py Spark, Python
Lambda, Amazon S3, Athena, RDS
Apache Parquet, JSON, CSV
Data Lake Design & Implementation
Good to have:
CI/CD (Terraform or AWS CDK or Cloud Formation)
Data Lineage & Governance
Kinesis/Kafka/Glue Streaming/AWS Batch
Key Responsibilities:
Collaborate with business stakeholders to analyze data requirements and define ETL & API architecture aligned with business goals.
Design and implement scalable ETL workflows using AWS Glue and Py Spark for ingesting structured and semi-structured data into the AWS data lake.
Develop reusable Glue jobs and crawlers for automated metadata cataloging and data transformations.
Optimize Glue job performance using dynamic frame partitioning, job bookmarking, and parallelism tuning.
Integrate Glue with other AWS services such as S3, Athena, Redshift, Lambda, and Cloud Watch for end-to-end orchestration and monitoring.
Lead data lakehouse implementation leveraging Glue with Iceberg for versioned, transactional data storage.
Ensure secure access to datasets using fine-grained IAM policies and Lake Formation(Good to Have).
Mentor junior engineers, enforce coding best practices, and participate in code reviews and architectural discussions.
At least Intermediate level in AWS ETL (Glue, lambda, Batch, EMR) & API Development(Fast API + Core python), Redshift
AWS Redshift:
Worked on creating provisioned clusters
Expert knowledge of Workload management
Materialized View, DDL Optimizations, Json handling
AWS Glue to Redshift Loads
Row & Column Level Security
AWS Glue (ETL, Jobs, Crawlers, Catalogue including Iceberg, Workflows, Dynamic Data Frames)
Worked on Batch & Stream workloads
Performance Tuning & Cost Optimization
IAM, KMS, Secret Manager and Fine-Grained Access Control (Encryption good to have)
Py Spark, Python
Lambda, Amazon S3, Athena, RDS
Apache Parquet, JSON, CSV
Data Lake Design & Implementation
Good to have:
CI/CD (Terraform or AWS CDK or Cloud Formation)
Data Lineage & Governance
Kinesis/Kafka/Glue Streaming/AWS Batch
Key Responsibilities:
Collaborate with business stakeholders to analyze data requirements and define ETL & API architecture aligned with business goals.
Design and implement scalable ETL workflows using AWS Glue and Py Spark for ingesting structured and semi-structured data into the AWS data lake.
Develop reusable Glue jobs and crawlers for automated metadata cataloging and data transformations.
Optimize Glue job performance using dynamic frame partitioning, job bookmarking, and parallelism tuning.
Integrate Glue with other AWS services such as S3, Athena, Redshift, Lambda, and Cloud Watch for end-to-end orchestration and monitoring.
Lead data lakehouse implementation leveraging Glue with Iceberg for versioned, transactional data storage.
Ensure secure access to datasets using fine-grained IAM policies and Lake Formation(Good to Have).
Mentor junior engineers, enforce coding best practices, and participate in code reviews and architectural discussions.
Apply for this Position
Ready to join ? Click the button below to submit your application.
Submit Application