Job Description

  • Build reliable, scalable systems through automation and engineering.
  • Improve service stability using SLOs, monitoring and incident response.

Acerca de nuestro cliente

A U.S.-based e-commerce organization specializing in personalized products, operating high-volume digital platforms supported by global teams. The company emphasizes technology-driven operations, strong customer experience, and scalable infrastructure to support rapid growth and large production capacity.

Descripción

Reliability & Performance

  • Define and manage SLIs, SLOs, and error budgets.
  • Improve system reliability, scalability, and resilience.
  • Lead reliability reviews and prevent incidents proactively.

Observability & Monitoring

  • Build and maintain monitoring, logging, and alerting.
  • Ensure actionable alerts and effective dashboards.
  • Implement distributed tracing.
  • ...

Apply for this Position

Ready to join Michael Page Colombia? Click the button below to submit your application.

Submit Application