Inference Systems Engineer

Fidel Consulting KK

📍 Tokyo ( Hybrid) , Tokyo ( Hybrid) , Japan

Full-time other-general Posted February 12, 2026

Apply Now Similar Jobs

Job Description

                        
            Appealing points:

	Own the core of LLM serving runtime and performance Take full ownership of request lifecycle, batching, scheduling, memory behavior, and failure handling in a high-impact LLM serving platform.
	Production-grade performance engineering at scale Drive real-world optimizations for latency, throughput, GPU utilization, and cache efficiency under realistic workloads and operational constraints.
	Strong engineering discipline and cross-team collaboration Work in an environment that values benchmarks, observability, safe rollouts, and calm incident response while partnering closely with infra, networking, and product teams.

Annual Salary: 8 Million and Above

Job Responsibilities:

	Own the end-to-end serving runtime behavior: request lifecycle, streaming semantics, cancellation, retries interaction, timeouts, and consistent failure models
	Design and implement batching and scheduling strategy: dy...

Apply for this Position

Ready to join Fidel Consulting KK ? Click the button below to submit your application.

Submit Application

Job Details

Location

Tokyo ( Hybrid) , Tokyo ( Hybrid) , Japan

Job Type

Full-time