Job Description
About T-Mobile:
T-Mobile US, Inc. (NASDAQ: TMUS), headquartered in Bellevue, Washington, is America’s supercharged Un-carrier, connecting millions through its strong nationwide network and flagship brands, T-Mobile and Metro by T-Mobile. Customers benefit from an unmatched combination of value, quality, and exceptional service experience.
About TMUS Global Solutions:
TMUS Global Solutions is a world-class technology powerhouse accelerating the company’s global digital transformation. With a culture built on growth, inclusivity, and global collaboration, the teams here drive innovation at scale, powered by bold thinking.
TMUS India Private Limited operates as TMUS Global Solutions.
Job Overview:
At
T-Mobile , we don’t just build technology — we empower people. We believe in investing in
YOU
— your growth, your impact, and your future. We’re unstoppable when individuals like you come together to solve bold challenges, inspire innovation, and build platforms that serve millions.
As a Senior Site Reliability Engineer (SRE), you will help ensure the availability, performance, and stability of platforms powering T-Mobile’s finance, credit, collections, document management, and supply chain systems. You will collaborate with application developers, DevOps, and cloud teams to build reliable, observable, and automated systems. This role is ideal for engineers passionate about operational excellence, learning distributed systems, and scaling production environments using code and data.
Key Responsibilities:
Reliability Engineering & Operations:
Contribute to the availability and performance of large-scale, customer-facing systems through
monitoring, alerting, and incident response .
Assist in designing and implementing
resiliency strategies , including health checks, failovers, circuit breakers, and retries.
Participate in
on-call rotations , help triage incidents, and assist in root cause analysis and post-incident reviews.
Automation & CI/CD Support:
Develop
scripts, tools, and automation
to reduce manual toil and improve operational efficiency.
Support infrastructure deployment and service rollout via
CI/CD pipelines
and
Infrastructure-as-Code
workflows (e.g., Terraform, Helm).
Work with developers to improve
service deployment, configuration management , and rollback strategies.
Observability & Metrics:
Help build and maintain
dashboards, alerts, and logs
that provide visibility into system health and application behavior.
Use tools such as
Prometheus, Grafana, Splunk , or Open Telemetry to monitor services and infrastructure.
Analyze system performance data to guide optimizations and proactively detect issues.
Cross-Team Collaboration
Work with DevOps, SREs, and software engineers to ensure that services are
built for reliability and observability .
Contribute to documentation, runbooks, playbooks, and operational readiness reviews.
Support development teams in designing systems that meet
SLOs and operational standards .
Qualifications:
Bachelor’s degree in computer science, Engineering, or a related technical field.
8+ years of experience in infrastructure, operations, DevOps, or SRE roles.
Proficiency in scripting or programming languages such as Java, Python, Go, and Bash.
Strong familiarity with Linux systems, container orchestration (Kubernetes), and cloud platforms (Azure preferred/GCP also relevant).
Hands-on experience with monitoring and observability tools such as Grafana, Splunk, and Open Telemetry.
Expertise in Kubernetes and container orchestration, including Docker templates, Helm charts, and GitLab templates.
Knowledge of authentication, authorization, encryption, SSL/TLS, SSH/SFTP, PKI, X.509 certificates, and PGP.
Solid understanding of incident management tools such as ServiceNow.
Preferred Skills:
Exposure to
incident management frameworks , including alerting, escalation, and postmortem practices.
Understanding of
SRE principles : SLOs, SLIs, error budgets, and service-level indicators.
Familiarity with tools like
HAProxy, Envoy Proxy, Kafka, RabbitMQ , or other core infrastructure components.
Experience with performance tuning of Kubernetes runtime components.
Experience with
CI/CD systems
(e.g., GitLab CI/CD, Jenkins, Spinnaker).
Knowledge, Skills, and Abilities
Strong problem-solving and analytical skills for diagnosing issues in distributed systems.
A growth mindset with a
passion for learning
observability, automation, and platform engineering best practices.
Strong communication skills and the ability to collaborate across teams.
Drive to improve system reliability, developer productivity, and customer experience
Why Join T-Mobile India?
At
T-Mobile India , you won’t just contribute to world-class technology—you’ll help build it. You’ll
work with global leaders , solve complex system challenges, and build platforms that redefine how technology powers customer experience.
We’re more than just a telecom company—we’re a
technology powerhouse
leading the way in
AI, data, and digital innovation . And we do it all with heart, grit, and a passion for empowering people.
Join us and shape the future of intelligent platforms that serve millions — at the scale and speed of T-Mobile.
Disclaimer:
TMUS India Private Limited, operating as TMUS Global Solutions, has engaged ANSR, Inc. ("ANSR") as its exclusive recruiting partner. That means that any communications regarding TMUS Global Solutions opportunities or employment offers will be issued only through ANSR and the 1Recruit platform. If you receive a communication or offer from another individual or entity, please notify TMUS Global Solutions immediately.
TMUS Global Solutions will never seek any payment or other compensation during the hiring process or request sensitive personal data (such as bank details or government-issued identification numbers) prior to a candidate’s acceptance of a formal offer.
T-Mobile US, Inc. (NASDAQ: TMUS), headquartered in Bellevue, Washington, is America’s supercharged Un-carrier, connecting millions through its strong nationwide network and flagship brands, T-Mobile and Metro by T-Mobile. Customers benefit from an unmatched combination of value, quality, and exceptional service experience.
About TMUS Global Solutions:
TMUS Global Solutions is a world-class technology powerhouse accelerating the company’s global digital transformation. With a culture built on growth, inclusivity, and global collaboration, the teams here drive innovation at scale, powered by bold thinking.
TMUS India Private Limited operates as TMUS Global Solutions.
Job Overview:
At
T-Mobile , we don’t just build technology — we empower people. We believe in investing in
YOU
— your growth, your impact, and your future. We’re unstoppable when individuals like you come together to solve bold challenges, inspire innovation, and build platforms that serve millions.
As a Senior Site Reliability Engineer (SRE), you will help ensure the availability, performance, and stability of platforms powering T-Mobile’s finance, credit, collections, document management, and supply chain systems. You will collaborate with application developers, DevOps, and cloud teams to build reliable, observable, and automated systems. This role is ideal for engineers passionate about operational excellence, learning distributed systems, and scaling production environments using code and data.
Key Responsibilities:
Reliability Engineering & Operations:
Contribute to the availability and performance of large-scale, customer-facing systems through
monitoring, alerting, and incident response .
Assist in designing and implementing
resiliency strategies , including health checks, failovers, circuit breakers, and retries.
Participate in
on-call rotations , help triage incidents, and assist in root cause analysis and post-incident reviews.
Automation & CI/CD Support:
Develop
scripts, tools, and automation
to reduce manual toil and improve operational efficiency.
Support infrastructure deployment and service rollout via
CI/CD pipelines
and
Infrastructure-as-Code
workflows (e.g., Terraform, Helm).
Work with developers to improve
service deployment, configuration management , and rollback strategies.
Observability & Metrics:
Help build and maintain
dashboards, alerts, and logs
that provide visibility into system health and application behavior.
Use tools such as
Prometheus, Grafana, Splunk , or Open Telemetry to monitor services and infrastructure.
Analyze system performance data to guide optimizations and proactively detect issues.
Cross-Team Collaboration
Work with DevOps, SREs, and software engineers to ensure that services are
built for reliability and observability .
Contribute to documentation, runbooks, playbooks, and operational readiness reviews.
Support development teams in designing systems that meet
SLOs and operational standards .
Qualifications:
Bachelor’s degree in computer science, Engineering, or a related technical field.
8+ years of experience in infrastructure, operations, DevOps, or SRE roles.
Proficiency in scripting or programming languages such as Java, Python, Go, and Bash.
Strong familiarity with Linux systems, container orchestration (Kubernetes), and cloud platforms (Azure preferred/GCP also relevant).
Hands-on experience with monitoring and observability tools such as Grafana, Splunk, and Open Telemetry.
Expertise in Kubernetes and container orchestration, including Docker templates, Helm charts, and GitLab templates.
Knowledge of authentication, authorization, encryption, SSL/TLS, SSH/SFTP, PKI, X.509 certificates, and PGP.
Solid understanding of incident management tools such as ServiceNow.
Preferred Skills:
Exposure to
incident management frameworks , including alerting, escalation, and postmortem practices.
Understanding of
SRE principles : SLOs, SLIs, error budgets, and service-level indicators.
Familiarity with tools like
HAProxy, Envoy Proxy, Kafka, RabbitMQ , or other core infrastructure components.
Experience with performance tuning of Kubernetes runtime components.
Experience with
CI/CD systems
(e.g., GitLab CI/CD, Jenkins, Spinnaker).
Knowledge, Skills, and Abilities
Strong problem-solving and analytical skills for diagnosing issues in distributed systems.
A growth mindset with a
passion for learning
observability, automation, and platform engineering best practices.
Strong communication skills and the ability to collaborate across teams.
Drive to improve system reliability, developer productivity, and customer experience
Why Join T-Mobile India?
At
T-Mobile India , you won’t just contribute to world-class technology—you’ll help build it. You’ll
work with global leaders , solve complex system challenges, and build platforms that redefine how technology powers customer experience.
We’re more than just a telecom company—we’re a
technology powerhouse
leading the way in
AI, data, and digital innovation . And we do it all with heart, grit, and a passion for empowering people.
Join us and shape the future of intelligent platforms that serve millions — at the scale and speed of T-Mobile.
Disclaimer:
TMUS India Private Limited, operating as TMUS Global Solutions, has engaged ANSR, Inc. ("ANSR") as its exclusive recruiting partner. That means that any communications regarding TMUS Global Solutions opportunities or employment offers will be issued only through ANSR and the 1Recruit platform. If you receive a communication or offer from another individual or entity, please notify TMUS Global Solutions immediately.
TMUS Global Solutions will never seek any payment or other compensation during the hiring process or request sensitive personal data (such as bank details or government-issued identification numbers) prior to a candidate’s acceptance of a formal offer.
Apply for this Position
Ready to join ? Click the button below to submit your application.
Submit Application