Job Description

Responsibilities:

  • Lead resolution of high-severity/complex incidents across hybrid infrastructure.
  • Architect and implement automation frameworks, self-healing workflows, and AI-driven ops.
  • Define SRE best practices, reliability SLIs/SLOs/SLAs, and operational standards.
  • Partner with application and platform engineering teams to improve resilience.
  • Drive observability maturity: predictive monitoring, anomaly detection, automated RCA.
  • Own continuous improvement of Engineer(s)/Sr Engineer(s) runbooks and automation pipelines.
  • Provide technical leadership, mentor junior SREs, and conduct training.
  • Identify new technologies, tools, and processes that elevate operational excellence.

Skills:

Mandatory Skills (Must-Have):

Incident Command & Complex Troubleshooting:

  • Expectation: Take leadership during high-s...

Apply for this Position

Ready to join TMUS Global Solutions? Click the button below to submit your application.

Submit Application