Job Description
What You’ll Do
You’ll operate at the intersection of software engineering and systems engineering, building resilient systems that scale, self-heal, and empower developers to ship safely.
Reliability Engineering
- Define and manage SLIs, SLOs, and error budgets
- Reduce MTTD, MTTA, and MTTR through structured incident response
- Conduct blameless postmortems and drive preventative improvements
- Champion reliability in architectural reviews and production readiness
Observability & Monitoring
- Design actionable, symptom-based alerts (not noise)
- Build dashboards and tracing systems using tools like CloudWatch, Prometheus, Grafana, New Relic, X-Ray, ADOT
- Implement synthetic monitoring to simulate real user journeys (URLs, clickpaths, APIs)
- Ensure full observability coverage across critical paths
Cloud & Infrastructure
- Operate and optimize AWS environment...
Apply for this Position
Ready to join Devopie Inc.? Click the button below to submit your application.
Submit Application