GPU Infrastructure Specialist

📍 alwar, alwar, India

Full-time Other-General Posted January 28, 2026

Apply Now Similar Jobs

Job Description

Dear all, 
We are looking for a GPU Infrastructure Specialist to manage and optimize GPU-based environments for model hosting and high-performance computing workloads. The ideal candidate will have hands-on experience with NVIDIA/ AMD, SambaNova GPU ecosystems, and a strong background in resource management, performance tuning, and observability within large-scale AI/ML environments. 


Responsibilities  

Manage, configure, and maintain GPU infrastructure across on-premise and cloud environments. 
Handle GPU resource allocation, scheduling, and orchestration for AI/ML workloads. 
Oversee driver updates, operator management, and compatibility across multiple GPU vendors (NVIDIA, AMD, SambaNova). 
Implement GPU tuning and performance optimization strategies to ensure efficient model inference and training performance. 
Monitor GPU utilization, latency, and system health using observability and alerting tools (e.g., Prometheus, Grafana, NVIDIA DCGM, etc.). 
Collaborate with AI engineers, DevOps, and MLOps teams to ensure seamless model deployment and hosting across GPU clusters. 
Develop automation scripts and workflows for GPU provisioning, scaling, and lifecycle management. 
Troubleshoot GPU performance issues, memory bottlenecks, and hardware-level anomalies. 

Qualifications  

Strong experience managing GPU infrastructure (NVIDIA, AMD, SambaNova). 
Proficiency in resource scheduling and orchestration (Kubernetes, Slurm, Ray, or similar). 
Knowledge of driver and operator management in multi-vendor environments. 
Experience with GPU tuning, profiling, and performance benchmarking. 
Familiarity with observability and alerting tools (Prometheus, Grafana, ELK Stack, etc.). 
Hands-on experience with model hosting platforms (Triton Inference Server, TensorRT, ONNX Runtime, etc.) is a plus. 
Working knowledge of Linux systems, Docker/Kubernetes, and CI/CD pipelines. 
Strong scripting skills in Python, Bash, or Go. 

Preferred Skills  


Bachelor’s or Master’s degree in Computer Science, Engineering, or related field. 
Certifications in GPU computing (e.g., NVIDIA Certified Administrator, CUDA, or similar). 
Experience with AI/ML model lifecycle management in production environments. 

Apply for this Position

Ready to join ? Click the button below to submit your application.

Submit Application

Job Details

Location

alwar, alwar, India

Job Type

Full-time