Job Description
Position Description:
Monitor Hadoop ecosystem (HDFS, Hive, Spark, Oozie, HBase, Kafka, Yarn, MapReduce) using enterprise monitoring tools (e.g., Nagios, Grafana, Cloudera Manager, Ambari).
Perform health checks for Hadoop clusters, nodes, and jobs to ensure high availability and stability.
Respond to alerts, incidents, and service requests, ensuring timely resolution or escalation to L2/L3 teams as per runbooks.
Support ETL batch processing, job scheduling (Control-M, CA7, Oozie, Airflow), and resolve basic job failures.
Execute predefined scripts, queries, and operational procedures for routine tasks.
Document incidents, root causes, and resolutions in knowledge base for continuous improvement.
Maintain SLA adherence for P1–P4 incidents, including proactive follow-ups and communication with stakeholders.
Collaborate with cross-functional teams (L2 engineers, developers, DBAs, infra teams) to support production stability.
Provide daily/weekly reports on incident trends, recurring issues, and cluster performance.
Skills:
Apply for this Position
Ready to join ? Click the button below to submit your application.
Submit Application