Job Description
Role: Data Engineer
Key Skill: Pyspark, Cloudera Data Platform, Big data – Hadoop, Hive, Kafka
Responsibilities
- Data Pipeline Development: Design, develop, and maintain highly scalable and optimized ETL pipelines using PySpark on the Cloudera Data Platform, ensuring data integrity and accuracy.
- Data Ingestion: Implement and manage data ingestion processes from a variety of sources (e.g., relational databases, APIs, file systems) to the data lake or data warehouse on CDP.
- Data Transformation and Processing: Use PySpark to process, cleanse, and transform large datasets into meaningful formats that support analytical needs and business requirements.
- Performance Optimization: Conduct performance tuning of PySpark code and Cloudera components, optimizing resource utilization and reducing runtime of ETL processes.
- Data Quality and Validation: Implement data quality checks, monitoring,...
Apply for this Position
Ready to join Virtusa? Click the button below to submit your application.
Submit Application