Job Description

We are looking for a Site Reliability Engineer (SRE) to ensure the reliability, scalability, and performance of our systems and services. The SRE will work closely with software engineering and operations teams to build resilient infrastructure, improve system observability, and automate operational tasks.

Key Responsibilities

  • Design, build, and maintain highly available and scalable systems
  • Monitor system performance, availability, and reliability
  • Define and track SLIs, SLOs, and SLAs
  • Respond to and resolve production incidents and outages
  • Perform root cause analysis (RCA) and implement preventive measures
  • Automate operational tasks to reduce manual work and toil
  • Improve CI/CD pipelines and deployment processes
  • Manage infrastructure as code (IaC) using tools like Terraform or CloudFormation
  • Implement observability solutions (logging, monitoring, alerting, tracing)
  • Collaborate with de...

Apply for this Position

Ready to join CareerUS Solutions? Click the button below to submit your application.

Submit Application