Job Description

About T-Mobile:
T-Mobile US, Inc. (NASDAQ: TMUS), headquartered in Bellevue, Washington, is America’s supercharged Un-carrier, connecting millions through its strong nationwide network and flagship brands, T-Mobile and Metro by T-Mobile. Customers benefit from an unmatched combination of value, quality, and exceptional service experience.

About TMUS Global Solutions:
TMUS Global Solutions is a world-class technology powerhouse accelerating the company’s global digital transformation. With a culture built on growth, inclusivity, and global collaboration, the teams here drive innovation at scale, powered by bold thinking.
TMUS India Private Limited operates as TMUS Global Solutions.

Job Overview:
At

T-Mobile , we don’t just build technology — we empower people. We believe in investing in

YOU

— your growth, your impact, and your future. We’re unstoppable when individuals like you come together to solve bold challenges, inspire innovation, and build platforms that serve millions.
As a Senior Site Reliability Engineer (SRE), you will help ensure the availability, performance, and stability of platforms powering T-Mobile’s finance, credit, collections, document management, and supply chain systems. You will collaborate with application developers, DevOps, and cloud teams to build reliable, observable, and automated systems. This role is ideal for engineers passionate about operational excellence, learning distributed systems, and scaling production environments using code and data.

Key Responsibilities:
Reliability Engineering & Operations:
Contribute to the availability and performance of large-scale, customer-facing systems through

monitoring, alerting, and incident response .
Assist in designing and implementing

resiliency strategies , including health checks, failovers, circuit breakers, and retries.
Participate in

on-call rotations , help triage incidents, and assist in root cause analysis and post-incident reviews.

Automation & CI/CD Support:
Develop

scripts, tools, and automation

to reduce manual toil and improve operational efficiency.
Support infrastructure deployment and service rollout via

CI/CD pipelines

and

Infrastructure-as-Code

workflows (e.g., Terraform, Helm).
Work with developers to improve

service deployment, configuration management , and rollback strategies.

Observability & Metrics:
Help build and maintain

dashboards, alerts, and logs

that provide visibility into system health and application behavior.
Use tools such as

Prometheus, Grafana, Splunk , or Open Telemetry to monitor services and infrastructure.
Analyze system performance data to guide optimizations and proactively detect issues.

Cross-Team Collaboration
Work with DevOps, SREs, and software engineers to ensure that services are

built for reliability and observability .
Contribute to documentation, runbooks, playbooks, and operational readiness reviews.
Support development teams in designing systems that meet

SLOs and operational standards .

Qualifications:
Bachelor’s degree in computer science, Engineering, or a related technical field.
8+ years of experience in infrastructure, operations, DevOps, or SRE roles.
Proficiency in scripting or programming languages such as Java, Python, Go, and Bash.
Strong familiarity with Linux systems, container orchestration (Kubernetes), and cloud platforms (Azure preferred/GCP also relevant).
Hands-on experience with monitoring and observability tools such as Grafana, Splunk, and Open Telemetry.
Expertise in Kubernetes and container orchestration, including Docker templates, Helm charts, and GitLab templates.
Knowledge of authentication, authorization, encryption, SSL/TLS, SSH/SFTP, PKI, X.509 certificates, and PGP.
Solid understanding of incident management tools such as ServiceNow.

Preferred Skills:
Exposure to

incident management frameworks , including alerting, escalation, and postmortem practices.
Understanding of

SRE principles : SLOs, SLIs, error budgets, and service-level indicators.
Familiarity with tools like

HAProxy, Envoy Proxy, Kafka, RabbitMQ , or other core infrastructure components.
Experience with performance tuning of Kubernetes runtime components.
Experience with

CI/CD systems

(e.g., GitLab CI/CD, Jenkins, Spinnaker).
Knowledge, Skills, and Abilities
Strong problem-solving and analytical skills for diagnosing issues in distributed systems.
A growth mindset with a

passion for learning

observability, automation, and platform engineering best practices.
Strong communication skills and the ability to collaborate across teams.
Drive to improve system reliability, developer productivity, and customer experience

Why Join T-Mobile India?
At

T-Mobile India , you won’t just contribute to world-class technology—you’ll help build it. You’ll

work with global leaders , solve complex system challenges, and build platforms that redefine how technology powers customer experience.
We’re more than just a telecom company—we’re a

technology powerhouse

leading the way in

AI, data, and digital innovation . And we do it all with heart, grit, and a passion for empowering people.
Join us and shape the future of intelligent platforms that serve millions — at the scale and speed of T-Mobile.

Disclaimer:

TMUS India Private Limited, operating as TMUS Global Solutions, has engaged ANSR, Inc. ("ANSR") as its exclusive recruiting partner. That means that any communications regarding TMUS Global Solutions opportunities or employment offers will be issued only through ANSR and the 1Recruit platform. If you receive a communication or offer from another individual or entity, please notify TMUS Global Solutions immediately.
TMUS Global Solutions will never seek any payment or other compensation during the hiring process or request sensitive personal data (such as bank details or government-issued identification numbers) prior to a candidate’s acceptance of a formal offer.

Apply for this Position

Ready to join ? Click the button below to submit your application.

Submit Application