Staff Software Engineer - GenAI inference

Databricks

📍 San Francisco, California, United States

Full-time Computer Occupations Posted February 06, 2026

Apply Now Similar Jobs

Job Description

P-1285
About This Role

As a staff software engineer for GenAI inference, you will lead the architecture, development, and optimization of the inference engine that powers Databricks Foundation Model API.. You’ll bridge research advances and production demands, ensuring high throughput, low latency, and robust scaling. Your work will encompass the full GenAI inference stack: kernels, runtimes, orchestration, memory, and integration with frameworks and orchestration systems.

What You Will Do
Own and drive the architecture, design, and implementation of the inference engine, and collaborate on model-serving stack optimized for large-scale LLMs inference

Partner closely with researchers to bring new model architectures or features (sparsity, activation compression, mixture-of-experts) into the engine

Lead the end-to-end optimization for latency, throughput, memory efficiency, and hardware...

Apply for this Position

Ready to join Databricks? Click the button below to submit your application.

Submit Application

Job Details

Location

San Francisco, California, United States

Job Type

Full-time