Senior DevOps Engineer / Platform Reliability Lead

📍 India, West Bengal, India

Full-time Computer Occupations Posted January 23, 2026

Apply Now Similar Jobs

Job Description

Senior DevOps Engineer / Platform Reliability Lead 

Exp : 10-12+ years
Location : Kolkata

Role Overview
We are seeking a Senior DevOps Engineer / Platform Reliability Lead who can take an end-to-end view of our systems, identify improvement areas across architecture, infrastructure, deployment pipelines, and reliability, and guide the platform toward higher scalability, stability, and operational maturity.
This role requires strong system thinking, sound architectural judgment, and the ability to clearly call out risks and improvements.

Key Responsibilities
Review the complete backend ecosystem (Node.js, Golang services, cloud infrastructure, CI/CD).
Identify architectural, scalability, reliability, and security gaps post in-house migration.
Recommend and prioritise short-term fixes and long-term platform improvements.
Own containerized infrastructure using Docker and Kubernetes in production.
Design and maintain robust CI/CD pipelines with safe deployment and rollback strategies.
Implement and improve monitoring, logging, alerting, and incident response practices.
Define and track meaningful SLIs, SLOs, and error budgets.
Prepare systems for OTT traffic spikes during releases and live events.
Improve caching, queuing, and backend performance in collaboration with backend teams.
Drive secure access, secrets management, and cloud cost optimisation.
Act as a technical partner to backend, product, and leadership teams.

Required Technical Skills
Cloud & Infrastructure
Strong experience with AWS (EC2, EKS/ECS, S3, RDS/DynamoDB, IAM)
Docker and Kubernetes (production environments)
Infrastructure as Code – Terraform (preferred)
CI/CD & Operations
GitHub Actions / GitLab CI / Jenkins
Blue-green / canary deployments and rollback strategies
Backend Awareness
Node.js (Express / NestJS level understanding)
Golang (microservices, concurrency, profiling basics)
Observability
Prometheus, Grafana
Centralised logging (ELK / OpenSearch / Loki)
Distributed tracing (Jaeger / OpenTelemetry)
Data, Cache & Messaging
Redis (cache and/or queues)
Kafka / SQS / RabbitMQ (deep experience with at least one)
MongoDB (understanding of No-SQL DBs, bonus if experienced with Atlas offerings)
Security & Reliability
Secrets management (Vault / AWS Secrets Manager)
IAM and least-privilege access design
Production incident handling experience

Personality & Mindset 
Strong ownership and accountability for platform reliability.
Comfortable identifying what is wrong and explaining how to fix it.
Calm and structured during incidents and high-pressure situations.
Clear communication with engineers and non-technical stakeholders.
Systems thinker who understands end-to-end impact, not just isolated components.
Pragmatic, data-driven, and collaborative.

Reach out to : [email protected] / [email protected]

Apply for this Position

Ready to join ? Click the button below to submit your application.

Submit Application

Job Details

Location

India, West Bengal, India

Job Type

Full-time