Job Description

We are looking for a motivated Site Reliability Engineer (SRE) who will play a crucial role in driving operational excellence in our software development teams by ensuring the availability, performance and scalability of our production systems. You will work closely with one or potentially multiple software development teams to raise the bar in terms of their observability practice, enhance incident response capabilities and help reduce operational toil through automation.

Key Responsibilities
● Implement and manage the observability stack (metrics, logs, traces and alerts) to ensure optimal performance and availability
● Analyze observability data to proactively identify performance bottlenecks and drive reliability improvements
● Define, track and report on Service Level Objectives (SLOs) and Service Level
Indicators (SLIs) for key services.
● Identify, develop and implement automation tools to reduce operational toil and improve system reliabili...

Apply for this Position

Ready to join TVH? Click the button below to submit your application.

Submit Application