Job Description

P-1285

About This Role


As a staff software engineer for GenAI inference, you will lead the architecture, development, and optimization of the inference engine that powers Databricks Foundation Model API.. You’ll bridge research advances and production demands, ensuring high throughput, low latency, and robust scaling. Your work will encompass the full GenAI inference stack: kernels, runtimes, orchestration, memory, and integration with frameworks and orchestration systems.


What You Will Do

  • Own and drive the architecture, design, and implementation of the inference engine, and collaborate on model-serving stack optimized for large-scale LLMs inference

  • Partner closely with researchers to bring new model architectures or features (sparsity, activation compression, mixture-of-experts) into the engine

  • Lead the end-to-end optimization for latency, throughput, memory efficiency, and hardware...
  • Apply for this Position

    Ready to join Databricks? Click the button below to submit your application.

    Submit Application