Job Description

                                                                                               Overview As a senior DevOps Engineer, you will  own  the AWS infrastructure and DevOps toolchain for a high-scale ad serving system composed of asynchronous  Java microservices (Akka framework) .


Targets include  <50ms  response time and  up to 5M concurrent users  with  99.99% uptime .


Responsibilities Design & stand up AWS  environments end-to-end (landing zone, VPCs, networking, security, automation).


Build  immutable infrastructure  and  CI/CD  for Java microservices (Maven/Gradle) including blue/green & canary releases and automated rollbacks.


Implement  observability : metrics, logs, traces, SLOs/SLIs, alerting, on-call runbooks.


Engineer  reliability & performance : autoscaling, caching layers, multi-AZ/region DR, capacity planning to support 5M+ concurrent users and p95/p99 latency goals.


Establish  security-by-design : IAM least privilege, KMS/Secrets Manager, WAF/Shield, image/signing policies, CIS benchmarks.


Partner with EY developers & Performance Test Engineer to tune JVM/Akka, thread pools, GC, and infra limits based on load-testing feedback.


Champion  cost governance  and tagging; produce dashboards and weekly reports.


Tech you’ll use (you don’t need every single one, but you know most) AWS : EKS/ECS, EC2, ALB/NLB, API Gateway/Lambda, S3/CloudFront, DynamoDB/ElastiCache (Redis), Aurora/RDS, MSK/Kinesis, OpenSearch, Route 53, VPC, NAT/GW, WAF/Shield, CloudWatch/X-Ray, IAM, KMS, Secrets Manager.


IaC & CI/CD : Terraform/CloudFormation, Helm, Argo CD or Flux, GitHub Actions/Jenkins/GitLab CI, Docker.


Observability : CloudWatch, OpenTelemetry, Prometheus/Grafana, log pipelines.


Languages/Build : Bash/Python for automation; familiarity with Java build/release workflows.


What makes you a great fit 3–5+ years total experience;  Senior/Manager-level  depth in AWS platform engineering for  high-throughput, low-latency  services.


Proven ownership of  production systems at 10k–1M+ concurrent users  (or comparable high RPS) with 99.9x SLOs.


Hands-on with  Akka/Java microservice delivery pipelines  (nice if you’ve tuned JVM, GC, Akka dispatchers).


Strong grounding in  scaling patterns  (event-driven, async IO, caching, backpressure, rate limiting) and  resilience (circuit breakers, retries, chaos).


Excellent collaboration, documentation, and stakeholder communication.


Logistics Location : Remote (prefer India candidates) Schedule : Must join  US morning calls (Eastern Time)  as needed.


Start :  1–3 weeks  from offer.


Term : Through  end of January  (likely extension).

  •   Powered by JazzHR

  • Apply for this Position

    Ready to join ? Click the button below to submit your application.

    Submit Application