Job Description

We are mainly looking for a ML Engineer who is experienced and ready to take on this role. The candidate should have a strong background in ML and be capable of handling the tasks and responsibilities that come with the position.
ML Infrastructure
Performance Engineer
Focus:
This role focuses on the "serving plane." The engineer will integrate high-speed inference runtimes with streaming loaders and take ownership of the performance benchmarking mandate.
Key Responsibilities:
Integrate
SGLang
with the
Run:ai Model Streamer
to enable concurrent tensor streaming directly to GPU memory, reducing model "cold start" times.
Optimize SGLang s backend runtime, leveraging features like
RadixAttention
for prefix caching and compressed finite-state machines for faster decoding.
Design and execute rigorous
performance benchmarking
suites to identify bottlenecks in the inference stack and provide code-level "fixes" to improve t...

Apply for this Position

Ready to join Yantran LLC? Click the button below to submit your application.

Submit Application