Job Description

**Description**

**Platform & System Reliability (SRE)**



+ Build and maintain highly available, scalable, and fault-tolerant systems in GCP and other cloud environments.

+ Design and implement automated solutions to eliminate toil and improve operational efficiency.

+ Develop, refine, and maintain monitoring, observability, and alerting systems across infrastructure and services.

+ Instrument platforms with OpenTelemetry for metrics, logs, and traces.

+ Own incident response processes, including on-call participation, root-cause analysis, and post-incident improvement actions.

+ Build and support CI/CD pipelines, GitOps workflows, and infrastructure-as-code deployments (e.g., Terraform).
**Data Reliability Engineering (DRE)**

+ Ensure reliability, accuracy, and availability of batch, streaming, and real-time data pipelines.

+ Instrument data flows with data observability patterns, including lineage (OpenLineage), freshness, completeness, and quality checks.

+ Monitor data systems end-to-end using automated alerting and anomaly detection.

+ Contribute to data SLOs, SLIs, and error budgets that measure reliability and drive continuous improvement.

+ Improve performance, scalability, and resilience across data storage systems (SQL,

+ NoSQL, lakehouse, analytics services).



**Qualifications**



+ 5–7 years in Site Reliability Engineering, Data Engineering, Platform Engineering, or similar roles.

+ Strong experience in GCP (preferred) plus exposure to OCI/Azure.

+ Proficiency in Python, Go, Bash, or similar languages for automation and tooling.

+ Hands-on experience with containerization, service mesh, and distributed systems design.

+ Expertise with observability platforms and telemetry standards (Prometheus, Grafana, Cloud Monitoring, OpenTelemetry).

+ Solid understanding of networking, Linux fundamentals, and scalable system design.

+ Familiarity with modern data platforms (BigQuery, Kafka, Spark, data lakes) and data reliability concepts.

+ Experience with IaC practices (Terraform, Ansible) and CI/CD systems.

+ Excellent communication skills for partnering with platform, data, and application teams.

+ Ability to work with team members and clients to assess needs, provide assistance, and resolve problems.

+ Strong problem-solving and analytical skills.

+ Desire to understand why things work the way they do.

+ Ability to present and explain technical concepts to business audiences.

+ All other duties as assigned.



This job posting will remain open a minimum of 72 hours and on an ongoing basis until filled.
**Job** Information Technology
**Primary Location** India-Karnataka-Bengaluru
**Schedule:** Full-time
**Travel:** No
**Req ID:** 254849
**Job Hire Type** Experienced Not Applicable #BMI N/A

Apply for this Position

Ready to join ? Click the button below to submit your application.

Submit Application