Job Description
- Manage, monitor, and improve application reliability, scalability, and performance.
- Implement and maintain monitoring, alerting, and observability tools (Dynatrace, Kibana, CloudWatch).
- Troubleshoot production issues and drive root cause analysis (RCA) for incidents.
- Automate operational processes using scripting (Python, Shell, or similar).
- Collaborate with development and DevOps teams to improve CI/CD and infrastructure reliability.
- Ensure high system uptime through proactive performance tuning and incident management.
- Work with AWS services (EC2, ECS, EKS, Lambda, S3, CloudWatch, etc.) for deployment and monitoring.
- Participate in on-call rotation and production support as required.
- Support Java / Microservices-based environments, ensuring efficient scaling and health monitoring.
- Maintain documentation for SRE processes, runbooks, and automation workflows.
Skills Required
Monitoring Tools...
Apply for this Position
Ready to join Pathfinders Global P Ltd? Click the button below to submit your application.
Submit Application