Senior AI Systems Engineer – Distributed Training & GPU Optimization

📍 bangalore, bangalore, India

Full-time Other-General Posted January 28, 2026

Apply Now Similar Jobs

Job Description

We are hiring an engineer who has personally built and optimized distributed training systems for large AI models and has deep, real-world experience optimizing GPU workloads specifically on Google Cloud. 
This is not a research role, not a general ML engineer role, and not cloud-agnostic. 

Core Responsibilities   
Distributed Training (Foundation-Scale)  
Build and operate multi-node, multi-GPU distributed training systems (16–128+ GPUs). 
Implement and tune: 
PyTorch Distributed (DDP, FSDP, TorchElastic) 
DeepSpeed (ZeRO-2 / ZeRO-3, CPU/NVMe offload) 
Hybrid parallelism (data, tensor, pipeline) 
Create reusable distributed training frameworks and templates for large models. 
Handle checkpoint sharding, failure recovery, and elastic scaling. 
GPU Optimization (Google Cloud Only)  
Optimize GPU utilization and cost on Google Cloud GPUs: 
A100, H100, L4 
Achieve high utilization through: 
Mixed precision (FP16 / BF16) 
Gradient checkpointing 
Memory optimization and recomputation 
Tune NCCL communication (All-Reduce, All-Gather) for multi-node GCP clusters. 
Reduce GPU idle time and cost per training run. 
Google Cloud Execution  
Run and optimize training jobs using: 
Vertex AI custom training 
GKE with GPU node pools 
Compute Engine GPU VMs 
Optimize GPU scheduling, scaling, and placement. 
Use preemptible GPUs safely for large training jobs. 
Performance Profiling  
Profile and debug GPU workloads using: 
NVIDIA Nsight Systems / Compute 
DCGM 
Identify compute, memory, and communication bottlenecks. 
Produce performance benchmarks and optimization reports. 

Required Experience (Recruiter Screening Criteria)  
Must-Have Experience (Non-Negotiable)  
8+ years in ML systems, distributed systems, or HPC 
Hands-on experience scaling multi-node GPU training (16+ GPUs) 
Deep expertise in: 
PyTorch Distributed 
DeepSpeed 
NCCL 
Direct production experience on Google Cloud GPUs 
Proven record of GPU performance and cost optimization 
Strongly Preferred 
Experience training foundation models / LLM-scale models 
Experience with Vertex AI + GKE 
Experience optimizing GPU workloads at enterprise scale 

Apply for this Position

Ready to join ? Click the button below to submit your application.

Submit Application

Job Details

Location

bangalore, bangalore, India

Job Type

Full-time