Job Description
Dear all,
We are looking for a GPU Infrastructure Specialist to manage and optimize GPU-based environments for model hosting and high-performance computing workloads. The ideal candidate will have hands-on experience with NVIDIA/ AMD, SambaNova GPU ecosystems, and a strong background in resource management, performance tuning, and observability within large-scale AI/ML environments.
Responsibilities
Manage, configure, and maintain GPU infrastructure across on-premise and cloud environments.
Handle GPU resource allocation, scheduling, and orchestration for AI/ML workloads.
Oversee driver updates, operator management, and compatibility across multiple GPU vendors (NVIDIA, AMD, SambaNova).
Implement GPU tuning and performance optimization strategies to ensure efficient model inference and training performance.
Monitor GPU utilization, latency, and system health using observability and alerting tools (e.g., Prometheus, Grafana, NVIDIA DCGM, etc.).
Collaborate with A...
We are looking for a GPU Infrastructure Specialist to manage and optimize GPU-based environments for model hosting and high-performance computing workloads. The ideal candidate will have hands-on experience with NVIDIA/ AMD, SambaNova GPU ecosystems, and a strong background in resource management, performance tuning, and observability within large-scale AI/ML environments.
Responsibilities
Manage, configure, and maintain GPU infrastructure across on-premise and cloud environments.
Handle GPU resource allocation, scheduling, and orchestration for AI/ML workloads.
Oversee driver updates, operator management, and compatibility across multiple GPU vendors (NVIDIA, AMD, SambaNova).
Implement GPU tuning and performance optimization strategies to ensure efficient model inference and training performance.
Monitor GPU utilization, latency, and system health using observability and alerting tools (e.g., Prometheus, Grafana, NVIDIA DCGM, etc.).
Collaborate with A...
Apply for this Position
Ready to join Confidential? Click the button below to submit your application.
Submit Application