Job Description
What you will be doing:
We are seeking an accomplished Advanced Observability DevOps Engineer with deep expertise in Kubernetes, Helm, Grafana, and modern cloud native architectures. This role will be responsible for the design, deployment, and optimization of scalable, secure, and highly available infrastructure while driving DevOps, GitOps, and SRE best practices. The candidate will work closely with the Lead Platform Engineer, data ingestion, and Platform DevOps teams under the Observability product. You will collaborate with cross-functional teams like Container Platform and Platform DevOps to automate CI/CD pipelines, enhance observability, and ensure platform reliability across multi-cloud (Azure/GCP) and hybrid environments.
This position will preferably be based out of India GCC, Bangalore.
Key Responsibilities:
- Experience in observability and monitoring initiatives as a DevOps Engineer
- Experience with container orchestration to deploy and manage highly available Kubernetes clusters (AKS, GKE, or on-prem) using GitHub Actions
- Develop and maintain Helm charts for application deployments, ensuring versioning, templating, and dependency management
- Experience in automating, managing, and monitoring cloud-native infrastructure and deployments
- Implement GitOps workflows for declarative infrastructure and application deployments
- Optimize multi-cloud networking (VPC, CNI plugins, service meshes like Istio) for performance and security
- Build, configure, and manage observability tools (Grafana, Prometheus, Mimir, Loki, Tempo, etc.) and experience building observability dashboards with custom PromQL queries for Kubernetes and microservices
- Implement distributed tracing (OpenTelemetry, Jaeger, Tempo) and log aggregation (Loki, Vector)
- Define SLOs and SLIs, automate incident response (ServiceNow), and conduct blameless postmortems
- Design and maintain scalable CI/CD pipelines using GitHub Actions and Jenkins with infrastructure as code (IaC)
- Automate provisioning, scaling, and self-healing of cloud resources
- Enforce DevSecOps practices (SAST/DAST, secret management with Vault, SOPS, or Azure/GCP)
- Optimize Kubernetes resource usage (HPA, VPA) and cloud cost efficiency (FinOps principles)
- Troubleshoot cluster performance, networking bottlenecks, and storage (CSI drivers, Rook/Ceph)
- Development and implementation of build and release pipelines with accountability for managing deployment schedules, issues, risks, and impediments
- Agile development experience with team member accountability for commitment and delivery each sprint
- Troubleshoot and implement corrections to problems associated with connectivity between supported applications and the clients they serve
- Provide technical guidance in the diagnosis of issues as they arise in support of critical application deployments
- Contribute to the design, implementation, and enhancement of critical application deployment Helm charts and CI/CD pipelines
- Ensure that all implementations of observability meet the requirements prescribed by IT Services through the effective implementation or use of approved processes, methodologies, and deliverables
- Ability to provide coding and technical direction to less experienced staff or develop highly complex original code
- Track infrastructure delivery and dependencies to implementation.
We are searching for someone with the following skills:
- Strong Experience with Kubernetes (AKS/ GKE) and Helm
- Proficient in containerization technologies like Docker
- Good Knowledge and understanding of Azure foundation components e.g. App GW, APIM, Virtual Network, NSG, Load Balancer, Azure VM etc. is required.
- Experience with Databases PostgreSQL, Redis, Kafka, Rook/ Ceph or similar databases.
- Knowledge of monitoring tools such as Log Analaytics, App Dynamics, Grafana, Prometheus, Splunk, and Sitescope
- Experience with microservices architecture and service mesh technologies
- Knowledge of security best practices for DevSecOps workflows, OPA/ Gatekeeper
- Deploying/managing and optimizing enterprise level observability platform deployments for Grafana OSS products like Mimir, Loki, Tempo, Fluentbit/Vector
- Experience must include CI/CD and infrastructure automation tools: Terraform(required) AWX, Desired-Ansibe
- Familiarity with Cloud technologies in Azure, OCI, and Google Cloud
- Experience on PCF, Docker, Kubernetes platform is required.
- Experience with scripting on Bash, Python, PowerShell or Go is required.
- Experience with DevOps and CI/CD tools and processes is required.
- Automate infrastructure provisioning and configuration using Ansible and Terraform
- Experience with multi-cloud networking (VPC, CNI plugins, service meshes like Istio) required.
- Experience in working with ServiceNow or similar Service Management tools
Below Certifications data is strongly preferred, but not required:
- Certified Kubernetes Administrator (CKA) / Developer (CKAD) / Security (CKS)
- Microsoft Certified: Azure DevOps Engineer
- Google Cloud Professional DevOps Engineer
- Experience with Agile/Scrum methodologies is required.
We believe the successful candidate has these qualifications and experience:
What it is like at Albertsons?
Albertsons Culture Principles
Compassion : We always treat each other with kindness and respect
Team : We always support and recognize each other
Inclusive : We always value everyones perspective
Learning : We always strive to grow and develop ourselves and others
Competitive : We always act with integrity to win over the customer
Ownership : We always take actions to drive our success
Apply for this Position
Ready to join ? Click the button below to submit your application.
Submit Application