Job Description
About the Role
.
We’re looking for an experienced Site Reliability Engineer (SRE) to join our team. In this role, you’ll work on meaningful projects that improve the reliability, performance, and efficiency of our systems. You’ll help reduce manual work through automation, support incident response, and contribute to continuous improvement efforts.
This position is ideal for someone who enjoys solving complex problems, collaborating across teams, and making systems more resilient and scalable.
Responsibilities
Design and implement solutions to improve system reliability and reduce manual tasks.
Monitor distributed systems and their dependencies to ensure performance and availability.
Automate recovery processes to maintain service levels.
Participate in on-call rotations and support incident response.
Share knowledge and provide informal mentorship to team members.
Contribute to process and tooling improvements based on hands-on experience.
Requirements:
Experience with SRE practices such as monitoring, incident response, and automation.
CI/CD (including microservices pipeline design, large-scale Docker image handling) Git
Good in writing Terraform modules
Demonstrate good experience on Networking (NAT, outbound proxy, subnet-to-cluster communication)
Exposure to application Architecture, Performance Testing (pre-production)
Demonstrate Cloud experience
Linux (including bash scripting)
Good in trouble shooting , Cost Optimization
Security (Snyk, infra security tooling & processes)
Familiarity with distributed systems and cloud infrastructure.
Ability to write scripts or code to automate tasks (e.g., Python, Bash, Go).
Strong problem-solving skills and a collaborative mindset.
Willingness to learn and grow in a supportive team environment.
Apply for this Position
Ready to join ? Click the button below to submit your application.
Submit Application