Job Description
Deep Learning Solutions Architect – Inference Optimization
Join to apply for the Deep Learning Solutions Architect – Inference Optimization role at NVIDIA .
NVIDIA’s Worldwide Field Operations (WWFO) team is seeking a Solution Architect with a deep understanding of neural network inference. As our customers adopt increasingly complex inference pipelines on state‑of‑the‑art infrastructure, there is a growing need for experts who can guide the integration of advanced inference techniques such as speculative decoding, request scheduler optimizations or FP4 quantization. The ideal candidate will be proficient using tools such as TRT LLM, vLLM, SGLang or similar, and have strong systems knowledge, enabling customers to fully use the capabilities of the new GB300 NVL72 systems.
What You Will Be Doing
- Work directly with key customers to understand their technology and provide the best AI solutions.
- Perform in-depth analysis and optimization to ensure the best performance on GPU architecture systems (in particular Grace/ARM based systems). This includes support in optimization of large scale inference pipelines.
- Partner with Engineering, Product and Sales teams to develop, plan best suitable solutions for customers. Enable development and growth of product features through customer feedback and proof‑of‑concept evaluations.
What We Need To See
- Excellent verbal, written communication, and technical presentation skills in English.
- MS/PhD or equivalent experience in Computer Science, Data Science, Electrical/Computer Engineering, Physics, Mathematics, or other Engineering fields.
- 5+ years of work or research experience with Python, C++, or other software development.
- Work experience and knowledge of modern NLP, including good understanding of transformer, state‑space, diffusion, or MOE model architectures. This can include either expertise in training or optimization/compression/operation of DNNs.
- Understanding of key libraries used for NLP/LLM training (such as Megatron‑LM, NeMo, DeepSpeed etc.) and/or deployment (e.g. TensorRT‑LLM, vLLM, Triton Inference Server).
- Enthusiastic about collaborating with various teams and departments—such as Engineering, Product, Sales, and Marketing—this person thrives in dynamic environments and stays focused amid constant change.
- Self‑starter with demeanor for growth, passion for continuous learning and sharing findings across the team.
Ways To Stand Out From The Crowd
- Demonstrated experience in running and debugging large‑scale distributed deep learning training or inference processes.
- Experience working with larger transformer‑based architectures for NLP, CV, ASR or other.
- Applied NLP technology in production environments.
- Proficient with DevOps tools including Docker, Kubernetes, and Singularity.
- Understanding of HPC systems: data center design, high‑speed interconnect InfiniBand, cluster storage and scheduling design and/or management experience.
Widely considered to be one of the technology world’s most desirable employers, NVIDIA offers highly competitive salaries and a comprehensive benefits package.
NVIDIA is committed to fostering a diverse work environment and proud to be an equal opportunity employer. As we highly value diversity in our current and future employees, we do not discriminate (including in our hiring and promotion practices) on the basis of race, religion, color, national origin, gender, gender expression, sexual orientation, age, marital status, veteran status, disability status or any other characteristic protected by law.
Referrals increase your chances of interviewing at NVIDIA by 2x.
#J-18808-LjbffrApply for this Position
Ready to join ? Click the button below to submit your application.
Submit Application