Job Description

Role: Site Reliability Engineer (SRE) – Core IT Infrastructure

Location: Mumbai

Work mode: On-site (full Time)

Experience: 6+ year's


Key Responsibilities


Infrastructure Reliability & Operations

• Design, implement, and maintain highly available and fault-tolerant infrastructure

• Ensure reliability, performance, scalability, and security of core IT systems

• Monitor system health, capacity, and performance using proactive observability practices

• Lead incident response, root cause analysis (RCA), and post-incident reviews


Automation & SRE Development

• Develop and maintain automation tools, scripts, and frameworks to reduce manual operations

• Apply Infrastructure as Code (IaC) principles using tools such as Terraform, Ansible, or CloudFormation

• Build self-healing systems and automate repetitive operational tasks

• Improve deployment pipelines and operational workflows through engineering solutions


DevOps & Platform Engineering

• Collaborate with DevOps, development, and security teams to support CI/CD pipelines

• Enable seamless application deployments with minimal downtime

• Support containerized and orchestration platforms (Docker, Kubernetes, OpenShift)

• Implement best practices for configuration management and environment consistency


Monitoring, Observability & Performance

• Design and maintain monitoring, logging, and alerting systems

• Define and track SLIs, SLOs, and SLAs

• Optimize system performance, capacity planning, and cost efficiency

• Enhance observability using tools such as Prometheus, Grafana, ELK, Datadog, or similar


Security & Compliance

• Implement infrastructure security best practices

• Collaborate with security teams on vulnerability management and compliance requirements

• Ensure secure access, identity management, and audit readiness



Required Skills & Qualifications


Technical Skills

• Strong experience in Linux/Unix system administration

• Proficiency in programming/scripting (Python, Go, Bash, Shell, or similar)

• Experience with cloud platforms (AWS, Azure, or GCP)

• Hands-on experience with containerization and orchestration

• Knowledge of networking concepts (DNS, TCP/IP, load balancing, firewalls)

• Experience with monitoring, logging, and alerting tools

Apply for this Position

Ready to join ? Click the button below to submit your application.

Submit Application