Job Description

Responsibilities

  • Deploy and operate LLMs on GPUs (NVIDIA, cloud or on-prem).
  • Run and tune inference servers such as vLLM, TGI, SGLang, Triton, or equivalents.
  • Make capacity planning decisions: How many GPUs are required for X RPS, When to shard, batc...

Apply for this Position

Ready to join Confidential? Click the button below to submit your application.

Submit Application