Job Description

Experience: 8 years+


 Solid understanding of Google SRE principles and practices.

 Handson experience implementing SLIs, SLOs, and error budgets.

 Automation experience and hands on, preferably python. (Observability as

Code).

 Expertise in incident management, postmortems, and reliability improvement

cycles.

 Experience with monitoring and observability tools (e.g., Prometheus, Grafana,

New Relic, Datadog, Open Telemetry).

 Strong expertise in logging, tracing, and metricsbased troubleshooting.

 Ability to design alerts that reflect customer and business impact.

 Hands on with Linux, bash, git, CI/CD, Docker, K8S.

 Experience with Infrastructure as Code (Terraform, ARM, CloudFormation,

etc.).

 Familiarity with CI/CD pipelines and deployment automation.

 Strong focus on eliminating toil through automation.

 Good understanding on AWS cloud concepts. <...

Apply for this Position

Ready to join PeopleLogic? Click the button below to submit your application.

Submit Application