Job Description

Role Summary

Firmus is seeking a highly skilled and driven Kubernetes HPC Engineer to join our Software Defined Infrastructure team. In this role, you will build high-performance, fault-tolerant, and reliable infrastructure to support bare-metal provisioning, performance benchmarking, and platform validation.

You will be instrumental in ensuring the stability, performance, and continuous improvement of our complex and mission-critical bare-metal HPC GPU clusters.

Key Responsibilities

  • Own the end-to-end lifecycle of AI compute systems, including GPU compute, NVSwitch, and platform firmware (BIOS, GPU, NIC, and storage devices).
  • Define, maintain, and enforce supported firmware and driver compatibility matrices across hardware generations, operating systems, kernels, and AI software stacks.
  • Lead firmware qualification and regression testing to ensure updates do not introduce performance degradation, instability, or compatibility issues...

Apply for this Position

Ready to join Firmus Technologies? Click the button below to submit your application.

Submit Application