Job Description
Team Leadership & Management
- Lead, mentor, and support a team of NOC engineers across all shifts.
- Set priorities, distribute tasks, and ensure proper workload balance within the team.
- Drive professional development through training, coaching, and ongoing feedback.
- Conduct periodic 1:1 meetings, performance evaluations, and goal-setting.
- Recruit, onboard, and integrate new NOC engineers into the team.
- Build and maintain a culture of accountability, high performance, and service quality.
Operational Oversight
- Own the day-to-day operations of the entire NOC function, ensuring consistent monitoring, alert handling, and operational routine execution.
- Ensure all teams consistently follow predefined procedures, escalation paths, and runbooks.
- Validate and improve health checks, monitoring dashboards, and operational KPIs.
- Oversee shift handovers, ensuring accuracy, clarity, and continuity of operations.
Incident Management
- Serve as the primary incident coordinator for major incidents (P1/P2) and oversee response efforts across shifts.
- Ensure correct triage, prioritization, and mitigation actions by the team.
- Coordinate escalation to Tier 2/3, Infrastructure, Security, and relevant stakeholders.
- Lead post-incident reviews, ensuring documentation, root cause analysis, and follow-up action items are completed.
Service Quality & Continuous Improvement
- Monitor team performance, SLAs, and KPIs; ensure targets are met or exceeded.
- Identify recurring issues, monitoring gaps, or operational inefficiencies and drive improvement initiatives.
- Own and update NOC processes, SOPs, runbooks, and documentation.
- Collaborate with cross-functional teams (Infrastructure, Networking, Security, Dev Ops, etc.) to enhance system reliability and monitoring coverage.
- Proactively recommend improvements to monitoring, alerting, automation, and NOC workflows.
Communication & Reporting
- Provide clear and consistent communication to management regarding incidents, trends, risks, and operational status.
- Deliver daily/weekly operational reports, including incident summaries and team performance insights.
- Represent the NOC function in internal meetings, service reviews, and cross-team coordination sessions.
Desired Background
- Proven experience leading or managing technical teams in a NOC, Operations, or Monitoring environment.
- Should have minimum 10+ years experience in NOC & 3+ years of experience in leading the team.
- Strong troubleshooting expertise across network, system, and cloud environments.
- Knowledge of key network protocols (TCP, UDP, DNS, HTTP/S, SSH, BGP fundamentals).
- Experience with Linux system administration (logs, services, resource usage, shell) and Windows Server fundamentals.
- Familiarity with cloud platforms (AWS, Azure, GCP) and cloud monitoring concepts.
- Hands-on experience with monitoring and alerting platforms such as Icinga, Prometheus, Grafana, Pager Duty, or equivalent.
- Ability to interpret logs, alerts, metrics, and telemetry data and guide the team in troubleshooting.
- Understanding of VPNs, firewalls, load balancers, proxies, and general IT infrastructure.
- Experience with ticketing and incident management tools (e.g., Jira, Service Now).
- Excellent communication skills, high situational awareness, and calm decision-making under pressure.
- High proficiency in English (written and verbal).
- Lead, mentor, and support a team of NOC engineers across all shifts.
- Set priorities, distribute tasks, and ensure proper workload balance within the team.
- Drive professional development through training, coaching, and ongoing feedback.
- Conduct periodic 1:1 meetings, performance evaluations, and goal-setting.
- Recruit, onboard, and integrate new NOC engineers into the team.
- Build and maintain a culture of accountability, high performance, and service quality.
Operational Oversight
- Own the day-to-day operations of the entire NOC function, ensuring consistent monitoring, alert handling, and operational routine execution.
- Ensure all teams consistently follow predefined procedures, escalation paths, and runbooks.
- Validate and improve health checks, monitoring dashboards, and operational KPIs.
- Oversee shift handovers, ensuring accuracy, clarity, and continuity of operations.
Incident Management
- Serve as the primary incident coordinator for major incidents (P1/P2) and oversee response efforts across shifts.
- Ensure correct triage, prioritization, and mitigation actions by the team.
- Coordinate escalation to Tier 2/3, Infrastructure, Security, and relevant stakeholders.
- Lead post-incident reviews, ensuring documentation, root cause analysis, and follow-up action items are completed.
Service Quality & Continuous Improvement
- Monitor team performance, SLAs, and KPIs; ensure targets are met or exceeded.
- Identify recurring issues, monitoring gaps, or operational inefficiencies and drive improvement initiatives.
- Own and update NOC processes, SOPs, runbooks, and documentation.
- Collaborate with cross-functional teams (Infrastructure, Networking, Security, Dev Ops, etc.) to enhance system reliability and monitoring coverage.
- Proactively recommend improvements to monitoring, alerting, automation, and NOC workflows.
Communication & Reporting
- Provide clear and consistent communication to management regarding incidents, trends, risks, and operational status.
- Deliver daily/weekly operational reports, including incident summaries and team performance insights.
- Represent the NOC function in internal meetings, service reviews, and cross-team coordination sessions.
Desired Background
- Proven experience leading or managing technical teams in a NOC, Operations, or Monitoring environment.
- Should have minimum 10+ years experience in NOC & 3+ years of experience in leading the team.
- Strong troubleshooting expertise across network, system, and cloud environments.
- Knowledge of key network protocols (TCP, UDP, DNS, HTTP/S, SSH, BGP fundamentals).
- Experience with Linux system administration (logs, services, resource usage, shell) and Windows Server fundamentals.
- Familiarity with cloud platforms (AWS, Azure, GCP) and cloud monitoring concepts.
- Hands-on experience with monitoring and alerting platforms such as Icinga, Prometheus, Grafana, Pager Duty, or equivalent.
- Ability to interpret logs, alerts, metrics, and telemetry data and guide the team in troubleshooting.
- Understanding of VPNs, firewalls, load balancers, proxies, and general IT infrastructure.
- Experience with ticketing and incident management tools (e.g., Jira, Service Now).
- Excellent communication skills, high situational awareness, and calm decision-making under pressure.
- High proficiency in English (written and verbal).
Apply for this Position
Ready to join ? Click the button below to submit your application.
Submit Application