Job Description

Overall Responsibilities:

  • Data Pipeline Development:Design, develop, and maintain highly scalable and optimized ETL pipelines using PySpark on the Cloudera Data Platform, ensuring data integrity and accuracy.
  • Data Ingestion:Implement and manage data ingestion processes from a variety of sources (e.g., relational databases, APIs, file systems) to the data lake or data warehouse on CDP.
  • Data Transformation and Processing:Use PySpark to process, cleanse, and transform large datasets into meaningful formats that support analytical needs and business requirements.
  • Performance Optimization:Conduct performance tuning of PySpark code and Cloudera components, optimizing resource utilization and reducing runtime of ETL processes.
  • Data Quality and Validation:Implement data quality checks, monitoring, and validation routines to ensure data accuracy and reliability throughout the pipeline.
  • A...
  • Apply for this Position

    Ready to join Synechron? Click the button below to submit your application.

    Submit Application