Job Description

Role: Site Reliability Engineer (SRE) – Core IT Infrastructure

Location: Pune

Work mode: On-site (full Time)

Experience: 6+ year's


Key Responsibilities


Infrastructure Reliability & Operations


- Design, implement, and maintain highly available and fault-tolerant infrastructure


- Ensure reliability, performance, scalability, and security of core IT systems


- Monitor system health, capacity, and performance using proactive observability practices


- Lead incident response, root cause analysis (RCA), and post-incident reviews


Automation & SRE Development


- Develop and maintain automation tools, scripts, and frameworks to reduce manual operations


- Apply Infrastructure as Code (IaC) principles using tools such as Terraform, Ansible, or CloudFormation


- Build self-healing systems and automate repetitive operational tasks


- Improve deployment pipelines and operational workflows through engineering solutions


DevOps & Platform Engineering


- Collaborate with DevOps, development, and security teams to support CI/CD pipelines


- Enable seamless application deployments with minimal downtime


- Support containerized and orchestration platforms (Docker, Kubernetes, OpenShift)


- Implement best practices for configuration management and environment consistency


Monitoring, Observability & Performance


- Design and maintain monitoring, logging, and alerting systems


- Define and track SLIs, SLOs, and SLAs


- Optimize system performance, capacity planning, and cost efficiency


- Enhance observability using tools such as Prometheus, Grafana, ELK, Datadog, or similar


Security & Compliance


- Implement infrastructure security best practices


- Collaborate with security teams on vulnerability management and compliance requirements


- Ensure secure access, identity management, and audit readiness



Required Skills & Qualifications


Technical Skills


- Strong experience in Linux/Unix system administration


- Proficiency in programming/scripting (Python, Go, Bash, Shell, or similar)


- Experience with cloud platforms (AWS, Azure, or GCP)


- Hands-on experience with containerization and orchestration


- Knowledge of networking concepts (DNS, TCP/IP, load balancing, firewalls)


- Experience with monitoring, logging, and alerting tools

Apply for this Position

Ready to join ? Click the button below to submit your application.

Submit Application