Job Description
- 5+ years in observability, monitoring, or reliability engineering roles.
- Hands-on experience with common observability tools such as Prometheus, Grafana, Splunk, Coralogix, and external monitoring tools (e.g., Catchpoint, Thousand Eyes).
- Strong scripting skills in Python, plus Bash or Power Shell for automation.
- Experience with Terraform and Ansible for infrastructure automation.
- Solid understanding of SLIs, SLOs, error budgets, and reliability engineering principles.
- Familiarity with Linux environments and distributed systems.
- Design and implement a Universal Dashboard in Grafana for leadership and engineering visibility.
- Ensure a consistent look and feel across all observability views.
- Define and implement SLIs, SLOs, and error budgets for critical services.
- Establish alerting thresholds and escalation workflows aligned with reliability goals.
- Integrate anomaly detection and AI-assisted insights into the observability platform.
-...
- Hands-on experience with common observability tools such as Prometheus, Grafana, Splunk, Coralogix, and external monitoring tools (e.g., Catchpoint, Thousand Eyes).
- Strong scripting skills in Python, plus Bash or Power Shell for automation.
- Experience with Terraform and Ansible for infrastructure automation.
- Solid understanding of SLIs, SLOs, error budgets, and reliability engineering principles.
- Familiarity with Linux environments and distributed systems.
- Design and implement a Universal Dashboard in Grafana for leadership and engineering visibility.
- Ensure a consistent look and feel across all observability views.
- Define and implement SLIs, SLOs, and error budgets for critical services.
- Establish alerting thresholds and escalation workflows aligned with reliability goals.
- Integrate anomaly detection and AI-assisted insights into the observability platform.
-...
Apply for this Position
Ready to join Concentrix? Click the button below to submit your application.
Submit Application