Job Description
- Build reliable, scalable systems through automation and engineering.
- Improve service stability using SLOs, monitoring and incident response.
Acerca de nuestro cliente
A U.S.-based e-commerce organization specializing in personalized products, operating high-volume digital platforms supported by global teams. The company emphasizes technology-driven operations, strong customer experience, and scalable infrastructure to support rapid growth and large production capacity.
Descripción
Reliability & Performance
- Define and manage SLIs, SLOs, and error budgets.
- Improve system reliability, scalability, and resilience.
- Lead reliability reviews and prevent incidents proactively.
Observability & Monitoring
- Build and maintain monitoring, logging, and alerting.
- Ensure actionable alerts and effective dashboards.
- Implement distributed tracing. ...
Apply for this Position
Ready to join Michael Page Colombia? Click the button below to submit your application.
Submit Application