Senior AI Systems Engineer – Distributed Training & GPU Optimization

Google

📍 Bengaluru, Karnataka, India

Full time Computer Occupations Posted February 11, 2026

Apply Now Similar Jobs

Job Description

We are hiring an engineer who has personally built and optimized distributed training systems for large AI models and has deep, real-world experience optimizing GPU workloads specifically on Google Cloud.
This is not a research role, not a general ML engineer role, and not cloud-agnostic.
Core Responsibilities 
Distributed Training (Foundation-Scale)
Build and operate multi-node, multi-GPU distributed training systems (16–128+ GPUs).
Implement and tune:
PyTorch Distributed (DDP, FSDP, TorchElastic)
DeepSpeed (ZeRO-2 / ZeRO-3, CPU/NVMe offload)
Hybrid parallelism (data, tensor, pipeline)
Create reusable distributed training frameworks and templates for large models.
Handle checkpoint sharding, failure recovery, and elastic scaling.
GPU Optimization (Google Cloud Only)
Optimize GPU utilization and cost on Google Cloud GPUs:
<...
                    

Apply for this Position

Ready to join Google? Click the button below to submit your application.

Submit Application

Job Details

Location

Bengaluru, Karnataka, India

Job Type

Full time