Job Description
Appealing points:
Own the core of LLM serving runtime and performance Take full ownership of request lifecycle, batching, scheduling, memory behavior, and failure handling in a high-impact LLM serving platform.
Production-grade performance engineering at scale Drive real-world optimizations for latency, throughput, GPU utilization, and cache efficiency under realistic workloads and operational constraints.
Strong engineering discipline and cross-team collaboration Work in an environment that values benchmarks, observability, safe rollouts, and calm incident response while partnering closely with infra, networking, and product teams.
Annual Salary: 8 Million and Above
Job Responsibilities:
Own the end-to-end serving runtime behavior: request lifecycle, streaming semantics, cancellation, retries interaction, timeouts, and consistent failure models
Design and implement batching and scheduling strategy: dy...
Apply for this Position
Ready to join Fidel Consulting KK ? Click the button below to submit your application.
Submit Application