Job Description

Site Reliability Engineer (SRE) – Incident Response / Production Monitoring
We are looking for a Site Reliability Engineer to support large-scale, customer-facing production platforms in a cloud environment. The role focuses on monitoring systems, managing incidents, performing root cause analysis, and improving overall reliability and uptime.
Key Responsibilities
Monitor production systems and respond to alerts during business hours
Acknowledge, triage, and manage incidents (P1–P4)
Investigate issues using logs, metrics, and traces
Perform Root Cause Analysis (RCA) and create postmortem documentation
Lead incident bridge calls and coordinate cross-functional teams
Communicate incident status to engineering and leadership teams
Improve monitoring, alerting, and observability coverage
Contribute to automation initiatives to reduce manual operational work
Required Skills & Experience
Experience in Incident Management / Incident Lifecycle
Background ...

Apply for this Position

Ready to join LanceSoft, Inc.? Click the button below to submit your application.

Submit Application