Job Description
Role: Site Reliability Engineer (SRE) – Core IT Infrastructure
Location: Pune
Work mode: On-site (full Time)
Experience: 6+ year's
Key Responsibilities
Infrastructure Reliability & Operations
- Design, implement, and maintain highly available and fault-tolerant infrastructure
- Ensure reliability, performance, scalability, and security of core IT systems
- Monitor system health, capacity, and performance using proactive observability practices
- Lead incident response, root cause analysis (RCA), and post-incident reviews
Automation & SRE Development
- Develop and maintain automation tools, scripts, and frameworks to reduce manual operations
- Apply Infrastructure as Code (IaC) principles using tools such as Terraform, Ansible, or CloudFormation
- Build self-healing systems and automate repetitive operational tasks
- Improve deployment pipelines and operational workflows through engineering solutions
DevOps & Platform Engineering
- Collaborate with DevOps, development, and security teams to support CI/CD pipelines
- Enable seamless application deployments with minimal downtime
- Support containerized and orchestration platforms (Docker, Kubernetes, OpenShift)
- Implement best practices for configuration management and environment consistency
Monitoring, Observability & Performance
- Design and maintain monitoring, logging, and alerting systems
- Define and track SLIs, SLOs, and SLAs
- Optimize system performance, capacity planning, and cost efficiency
- Enhance observability using tools such as Prometheus, Grafana, ELK, Datadog, or similar
Security & Compliance
- Implement infrastructure security best practices
- Collaborate with security teams on vulnerability management and compliance requirements
- Ensure secure access, identity management, and audit readiness
⸻
Required Skills & Qualifications
Technical Skills
- Strong experience in Linux/Unix system administration
- Proficiency in programming/scripting (Python, Go, Bash, Shell, or similar)
- Experience with cloud platforms (AWS, Azure, or GCP)
- Hands-on experience with containerization and orchestration
- Knowledge of networking concepts (DNS, TCP/IP, load balancing, firewalls)
- Experience with monitoring, logging, and alerting tools
Apply for this Position
Ready to join ? Click the button below to submit your application.
Submit Application