Job Description
Responsibilities
- Deploy and operate LLMs on GPUs (NVIDIA, cloud or on-prem).
- Run and tune inference servers such as vLLM, TGI, SGLang, Triton, or equivalents.
- Make capacity planning decisions: How many GPUs are required for X RPS, When to shard, batc...
Apply for this Position
Ready to join Confidential? Click the button below to submit your application.
Submit Application