Job Description
Job Overview:
At T-Mobile , we dont just build technology we empower people. We believe in investing in YOU your growth, your impact, and your future. Were unstoppable when individuals like you come together to solve bold challenges, inspire innovation, and build platforms that serve millions.
As a Senior Site Reliability Engineer (SRE), you will help ensure the availability, performance, and stability of platforms powering T-Mobiles finance, credit, collections, document management, and supply chain systems. You will collaborate with application developers, DevOps, and cloud teams to build reliable, observable, and automated systems. This role is ideal for engineers passionate about operational excellence, learning distributed systems, and scaling production environments using code and data.
Key Responsibilities:
Reliability Engineering & Operations:
- Contribute to the availability and performance of large-scale, customer-facing systems through monitoring, alerting, and incident response .
- Assist in designing and implementing resiliency strategies , including health checks, failovers, circuit breakers, and retries.
- Participate in on-call rotations , help triage incidents, and assist in root cause analysis and post-incident reviews.
Automation & CI/CD Support:
- Develop scripts, tools, and automation to reduce manual toil and improve operational efficiency.
- Support infrastructure deployment and service rollout via CI/CD pipelines and Infrastructure-as-Code workflows (e.g., Terraform, Helm).
- Work with developers to improve service deployment, configuration management , and rollback strategies.
Observability & Metrics:
- Help build and maintain dashboards, alerts, and logs that provide visibility into system health and application behavior.
- Use tools such as Prometheus, Grafana, Splunk , or Open Telemetry to monitor services and infrastructure.
- Analyze system performance data to guide optimizations and proactively detect issues.
Cross-Team Collaboration:
- Work with DevOps, SREs, and software engineers to ensure that services are built for reliability and observability .
- Contribute to documentation, runbooks, playbooks, and operational readiness reviews.
- Support development teams in designing systems that meet SLOs and operational standards .
Qualifications:
- Bachelors degree in computer science, Engineering, or a related technical field.
- 8+ years of experience in infrastructure, operations, DevOps, or SRE roles.
- Proficiency in scripting or programming languages such as Java, Python, Go, and Bash.
- Strong familiarity with Linux systems, container orchestration (Kubernetes), and cloud platforms (Azure preferred;AWS/GCP also relevant).
- Hands-on experience with monitoring and observability tools such as Grafana, Splunk, and Open Telemetry.
- Expertise in Kubernetes and container orchestration, including Docker templates, Helm charts, and GitLab templates.
- Knowledge of authentication, authorization, encryption, SSL/TLS, SSH/SFTP, PKI, X.509 certificates, and PGP.
- Solid understanding of incident management tools such as ServiceNow.
Preferred Skills:
- Exposure to incident management frameworks , including alerting, escalation, and postmortem practices.
- Understanding of SRE principles : SLOs, SLIs, error budgets, and service-level indicators.
- Familiarity with tools like HAP Roxy, Envoy Proxy, Kafka, RabbitMQ , or other core infrastructure components.
- Experience with performance tuning of Kubernetes runtime components.
- Experience with CI/CD systems (e.g., GitLab CI/CD, Jenkins, Spinnaker).
Knowledge, Skills, and Abilities:
- Strong problem-solving and analytical skills for diagnosing issues in distributed systems.
- A growth mindset with a passion for learning observability, automation, and platform engineering best practices.
- Strong communication skills and the ability to collaborate across teams.
- Drive to improve system reliability, developer productivity, and customer experience
Why Join T-Mobile India?
At T-Mobile India , you wont just contribute to world-class technologyyoull help build it. Youll work with global leaders , solve complex system challenges, and build platforms that redefine how technology powers customer experience.
Were more than just a telecom companywere a technology powerhouse leading the way in AI, data, and digital innovation . And we do it all with heart, grit, and a passion for empowering people.
Join us and shape the future of intelligent platforms that serve millions at the scale and speed of T-Mobile.
TMUS India Private Limited, operating as TMUS Global Solutions, has engaged ANSR, Inc. ("ANSR") as its exclusive recruiting partner. That meansthat any communications regarding TMUS Global Solutions opportunities or employment offers will be issued only through ANSR and the 1Recruit platform. If you receive a communication or offer from another individual or entity, please notify TMUS Global Solutions immediately.
TMUS Global Solutions willnever seek any payment or other compensation during the hiring process or request sensitive personal data (such as bank details or government-issued identification numbers) before a candidate accepts a formal offer.
Apply for this Position
Ready to join ? Click the button below to submit your application.
Submit Application