Site Reliability Engineer

📍 WorkFromHome, Ciudad de México, Mexico
Full-time Redes y sistemas Posted January 26, 2026
Apply Now Similar Jobs
Job Description

At Coderoad, we're more than just a software development company— we're your gateway to the global tech world. Whether you're looking to skill up or level up your career, we offer the challenges you’ve been searching for. 
We provide end-to-end software development services and give you the opportunity to work on exciting, real-world projects in a supportive environment. Whether it's staff augmentation, dedicated IT teams, or general software engineering, we have opportunities for everyone to challenge themselves and take their career to the next level! 
About the role We are looking for a Senior Site Reliability Engineer (SRE) with strong experience in observability, metrics, logging, and reliability engineering. This role will lead the design and implementation of our monitoring and observability strategy across multiple services, ensuring system performance, resiliency, and operational excellence. 
The ideal candidate combines deep expertise in SRE practices, strong understanding of software engineering, and hands‑on experience with modern observability stacks. 
Location: LATAM Time Zone: Team operates on U.S. East/West Coast hours. 
How You’ll Make an Impact Observability & Monitoring Define and implement SLIs, SLOs, and error budgets for critical services. 
Design and maintain dashboards and alerting systems using tools like Prometheus, Grafana, ELK, OpenTelemetry, or equivalents. 
Standardize logging, tracing, and metrics across all applications and services. 
Continuously improve the system’s visibility and health tracking to support high availability. 
Reliability Engineering Drive incident response, post‑mortems, and root‑cause analyses. 
Identify performance bottlenecks and propose architectural improvements. 
Implement chaos testing and resilience strategies where applicable. 
DevOps & Automation Develop CI/CD improvements that support reliability and quality. 
Automate operational workflows, deployments, and monitoring pipelines. 
Collaborate with development teams to ensure reliability is built into every service. 
Collaboration & Technical Leadership Work closely with software engineers to establish observability best practices. 
Create internal standards for logs, metrics, and distributed tracing. 
Provide technical mentorship and help shape long‑term reliability roadmaps. 
What We’re Looking For 5‑7+ years of experience in SRE, DevOps, or Platform Engineering roles. 
Strong experience with observability tools such as Prometheus, Grafana, ELK Stack, OpenTelemetry, Jaeger, Datadog, New Relic, etc. 
Solid understanding of Kubernetes, Docker, cloud platforms (AWS/GCP/Azure). 
Proficiency in at least one programming language (e.g., Java, Go, Python, Node.js). 
Experience implementing SLIs, SLOs, alerting strategies, and incident response. 
Ability to work cross‑functionally and drive technical decisions. 
Nice to Have Experience with service mesh technologies (e.g., Istio). 
Background in performance testing, load testing, or capacity planning. 
Experience with infrastructure as code (Terraform, Ansible). 
What you’ll love USA Contractor 
100% Remote 
Holidays Off 
Paid Time Off 
Health insurance assistance program. 
Competitive Pay (USD) 
Excellent teamwork and work environment 
Training 
Seniority Level Mid‑Senior level 
Employment Type Contract 
Job Function Consulting and Business Development 
Industries IT Services and IT Consulting 
 #J-18808-Ljbffr
                    
Apply for this Position

Ready to join ? Click the button below to submit your application.
Submit Application
Job Details

Location
WorkFromHome, Ciudad de México, Mexico
Job Type
Full-time