Site Reliability Engineer

📍 Israel, Israel, Israel
Full-time Computer Occupations Posted January 21, 2026
Apply Now Similar Jobs
Job Description

Realize your potential by joining the leading performance-driven advertising company! 

As Site Reliability Engineer on the IT Production team in our TLV Office, you’ll play a vital role in building robust services and solving infrastructure challenges with automations while working with cutting-edge technologies and bringing those to their limits on our mostly on-prem cloud like infrastructure.

To thrive in this role, you’ll need:
7 years of experience as an SRE, DevOps Engineer, System Administrator in a large distributed environment with focus on Linux operating systems.

Experience supporting, troubleshooting and scaling large distributed systems in production.

Deep understanding of HTTP protocol, including HTTP/1.1, HTTP/2, caching semantics, TLS and gRPC delivery.

Experience configuring and operating CDN services (e.g., Akamai, Fastly, Cloudflare, AWS CloudFront).

Deep understanding in Linux system internals and system performance tuning.

Experience with Configuration Management Tools (Puppet, Ansible, Chef, Terraform).

Experience programming in at least one of the following languages (Python, Golang, Rust, Ruby, C++, Java).

Experience with monitoring and metrics collection systems (Prometheus, Grafana, ELK).

Experience with cloud providers and platforms (AWS, Azure, GCP, Alibaba).

Experience with containerization technologies (Kubernetes, Docker).

Deep understanding of networking principles (TCP/IP, DNS, load balancing).
How you’ll make an impact:

As a Site Reliability Engineer, you’ll bring value by:
Ensure Reliability & Scalability: Design, implement and manage highly reliable and scalable distributed systems across our on-premise, cloud and AI/ML environments. Proactively optimize performance, efficiency, resource utilization and cloud cost.

Drive Automation: Automate repetitive tasks, infrastructure provisioning, configuration and deployments using IaC and scripting languages (e.g., Python, Go, Rust).

Develop Observability & Capacity: Implement comprehensive monitoring and alerting systems to ensure system health. Collaborate on capacity planning to meet future growth.

Maintain Security & Compliance: Integrate security best practices and ensure compliance with industry standards.

Lead Incident Management: Participate in on-call rotations, lead incident responses and conduct root cause analysis to minimize downtime.

Foster Collaboration & Improvement: Work closely with development, operations and security teams to drive shared responsibility and continuous improvement in SRE practices.
Our Tech Stack:

Linux, Kubernetes, nginx, Istio, AWS, GCP, Azure, Alicloud, Fastly, Terraform, Consul, Prometheus, Loki, Grafana, Airflow, Redis, Kafka, Vector, Hadoop, Cassandra, Vertica, MySQL, HDFS, ELK.
Apply for this Position

Ready to join ? Click the button below to submit your application.
Submit Application
Job Details

Location
Israel, Israel, Israel
Job Type
Full-time