Job Description

Responsibilities

• Monitor and maintain production data pipelines to ensure 99.9% uptime and optimal performance

• Implement comprehensive logging, alerting, and monitoring systems using Application monitoring tools

• Perform regular health checks performance, job execution times, and resource utilization to identify and resolve bottlenecks proactively

• Manage incident response procedures for pipeline failures, including root cause analysis, resolution, and post-incident reviews

• Establish and maintain disaster recovery procedures and backup strategies for critical data assets within the Databricks environment

• Conduct regular performance tuning of Spark jobs and Databricks cluster configurations to optimize cost and execution efficiency

• Maintain comprehensive documentation for operational procedures, runbooks, and troubleshooting guides

• Coordinate scheduled maintenance windows and system upgrades with minimal business i...

Apply for this Position

Ready to join jobline resources pte. ltd.? Click the button below to submit your application.

Submit Application