Job Description
Technical Skills Required:
- At least Intermediate level in AWS ETL (Glue, lambda, Batch, EMR) & API Development(Fast API + Core python), Redshift
- AWS Redshift:
- Worked on creating provisioned clusters
- Expert knowledge of Workload management
- Materialized View, DDL Optimizations, Json handling
- AWS Glue to Redshift Loads
- Row & Column Level Security
- AWS Glue (ETL, Jobs, Crawlers, Catalogue including Iceberg, Workflows, Dynamic Data Frames)
- Worked on Batch & Stream workloads
- Performance Tuning & Cost Optimization
- IAM, KMS, Secret Manager and Fine-Grained Access Control (Encryption good to have)
- PySpark, Python
- Lambda, Amazon S3, Athena, RDS
- Apache Parquet, JSON, CSV
- Data Lake Design & Implementation
Good to have:
- CI/CD (Terraform or AWS CDK or CloudFormation)
- Data Lineage & Governance
- Kinesis/Kafka/Glue Streaming/AWS Batch
Key Responsibilities:
- Collaborate with business stakeholders to analyze data requirements and define ETL & API architecture aligned with business goals.
- Design and implement scalable ETL workflows using AWS Glue and PySpark for ingesting structured and semi-structured data into the AWS data lake.
- Develop reusable Glue jobs and crawlers for automated metadata cataloging and data transformations.
- Optimize Glue job performance using dynamic frame partitioning, job bookmarking, and parallelism tuning.
- Integrate Glue with other AWS services such as S3, Athena, Redshift, Lambda, and CloudWatch for end-to-end orchestration and monitoring.
- Lead data lakehouse implementation leveraging Glue with Iceberg for versioned, transactional data storage.
- Ensure secure access to datasets using fine-grained IAM policies and Lake Formation(Good to Have).
- Mentor junior engineers, enforce coding best practices, and participate in code reviews and architectural discussions.
Apply for this Position
Ready to join ? Click the button below to submit your application.
Submit Application