Job Description


Appealing points:


Own the core of LLM serving runtime and performance Take full ownership of request lifecycle, batching, scheduling, memory behavior, and failure handling in a high-impact LLM serving platform.
Production-grade performance engineering at scale Drive real-world optimizations for latency, throughput, GPU utilization, and cache efficiency under realistic workloads and operational constraints.
Strong engineering discipline and cross-team collaboration Work in an environment that values benchmarks, observability, safe rollouts, and calm incident response while partnering closely with infra, networking, and product teams.


Annual Salary: 8 Million and Above

Job Responsibilities:


Own the end-to-end serving runtime behavior: request lifecycle, streaming semantics, cancellation, retries interaction, timeouts, and consistent failure models
Design and implement batching and scheduling strategy: dy...

Apply for this Position

Ready to join Fidel Consulting KK ? Click the button below to submit your application.

Submit Application