Job Description

NVIDIA Dynamo is a high-throughput, low-latency inference framework for serving generative AI and reasoning models across multi-node distributed environments. Built in Rust for performance and Python for extensibility, Dynamo orchestrates GPU shards, routes requests, and manages shared KV cache across heterogeneous clusters so that many accelerators feel like a single system at datacenter scale. As large language models rapidly outgrow the memory and compute budget of any single GPU, this platform enables efficient, resilient deployment of cutting-edge LLM workloads.

We are seeking a Principal Systems Engineer to define the vision and roadmap for memory management of large-scale LLM and storage systems.

What you'll be doing:
+ Design and evolve a unified memory layer that spans GPU memory, pinned host memory, RDMA-accessible memory, SSD tiers, and remote file/object/cloud storage to support large-scale LLM inference.
+ Architect and implement deep integrations w...

Apply for this Position

Ready to join NVIDIA? Click the button below to submit your application.

Submit Application