Job Description
**Description**
**Platform & System Reliability (SRE)**
+ Build and maintain highly available, scalable, and fault-tolerant systems in GCP and other cloud environments.
+ Design and implement automated solutions to eliminate toil and improve operational efficiency.
+ Develop, refine, and maintain monitoring, observability, and alerting systems across infrastructure and services.
+ Instrument platforms with OpenTelemetry for metrics, logs, and traces.
+ Own incident response processes, including on-call participation, root-cause analysis, and post-incident improvement actions.
+ Build and support CI/CD pipelines, GitOps workflows, and infrastructure-as-code deployments (e.g., Terraform).
**Data Reliability Engineering (DRE)**
+ Ensure reliability, accuracy, and availability of batch, streaming, and real-time data pipelines.
+ Instrument data flows with data observability patterns, including lineage (OpenLineage), freshness, completeness, and quality checks.
+ Monitor data systems end-to-end using automated alerting and anomaly detection.
+ Contribute to data SLOs, SLIs, and error budgets that measure reliability and drive continuous improvement.
+ Improve performance, scalability, and resilience across data storage systems (SQL,
+ NoSQL, lakehouse, analytics services).
**Qualifications**
+ 5–7 years in Site Reliability Engineering, Data Engineering, Platform Engineering, or similar roles.
+ Strong experience in GCP (preferred) plus exposure to OCI/Azure.
+ Proficiency in Python, Go, Bash, or similar languages for automation and tooling.
+ Hands-on experience with containerization, service mesh, and distributed systems design.
+ Expertise with observability platforms and telemetry standards (Prometheus, Grafana, Cloud Monitoring, OpenTelemetry).
+ Solid understanding of networking, Linux fundamentals, and scalable system design.
+ Familiarity with modern data platforms (BigQuery, Kafka, Spark, data lakes) and data reliability concepts.
+ Experience with IaC practices (Terraform, Ansible) and CI/CD systems.
+ Excellent communication skills for partnering with platform, data, and application teams.
+ Ability to work with team members and clients to assess needs, provide assistance, and resolve problems.
+ Strong problem-solving and analytical skills.
+ Desire to understand why things work the way they do.
+ Ability to present and explain technical concepts to business audiences.
+ All other duties as assigned.
This job posting will remain open a minimum of 72 hours and on an ongoing basis until filled.
**Job** Information Technology
**Primary Location** India-Karnataka-Bengaluru
**Schedule:** Full-time
**Travel:** No
**Req ID:** 254849
**Job Hire Type** Experienced Not Applicable #BMI N/A
**Platform & System Reliability (SRE)**
+ Build and maintain highly available, scalable, and fault-tolerant systems in GCP and other cloud environments.
+ Design and implement automated solutions to eliminate toil and improve operational efficiency.
+ Develop, refine, and maintain monitoring, observability, and alerting systems across infrastructure and services.
+ Instrument platforms with OpenTelemetry for metrics, logs, and traces.
+ Own incident response processes, including on-call participation, root-cause analysis, and post-incident improvement actions.
+ Build and support CI/CD pipelines, GitOps workflows, and infrastructure-as-code deployments (e.g., Terraform).
**Data Reliability Engineering (DRE)**
+ Ensure reliability, accuracy, and availability of batch, streaming, and real-time data pipelines.
+ Instrument data flows with data observability patterns, including lineage (OpenLineage), freshness, completeness, and quality checks.
+ Monitor data systems end-to-end using automated alerting and anomaly detection.
+ Contribute to data SLOs, SLIs, and error budgets that measure reliability and drive continuous improvement.
+ Improve performance, scalability, and resilience across data storage systems (SQL,
+ NoSQL, lakehouse, analytics services).
**Qualifications**
+ 5–7 years in Site Reliability Engineering, Data Engineering, Platform Engineering, or similar roles.
+ Strong experience in GCP (preferred) plus exposure to OCI/Azure.
+ Proficiency in Python, Go, Bash, or similar languages for automation and tooling.
+ Hands-on experience with containerization, service mesh, and distributed systems design.
+ Expertise with observability platforms and telemetry standards (Prometheus, Grafana, Cloud Monitoring, OpenTelemetry).
+ Solid understanding of networking, Linux fundamentals, and scalable system design.
+ Familiarity with modern data platforms (BigQuery, Kafka, Spark, data lakes) and data reliability concepts.
+ Experience with IaC practices (Terraform, Ansible) and CI/CD systems.
+ Excellent communication skills for partnering with platform, data, and application teams.
+ Ability to work with team members and clients to assess needs, provide assistance, and resolve problems.
+ Strong problem-solving and analytical skills.
+ Desire to understand why things work the way they do.
+ Ability to present and explain technical concepts to business audiences.
+ All other duties as assigned.
This job posting will remain open a minimum of 72 hours and on an ongoing basis until filled.
**Job** Information Technology
**Primary Location** India-Karnataka-Bengaluru
**Schedule:** Full-time
**Travel:** No
**Req ID:** 254849
**Job Hire Type** Experienced Not Applicable #BMI N/A
Apply for this Position
Ready to join ? Click the button below to submit your application.
Submit Application