Software Engineer (GPU Infrastructure, High Performance Computing)

Cohere

📍 toronto, on, Canada

Full-time Other-General Posted June 13, 2026

Apply Now Similar Jobs

Job Description

                        Requirements

Deep expertise in ML/HPC infrastructure: Experience with GPU/TPU clusters, distributed training frameworks (JAX, PyTorch, TensorFlow), and high-performance computing (HPC) environments

Kubernetes at scale: Proven ability to deploy, manage, and troubleshoot cloud-native Kubernetes clusters for AI workloads

Strong programming skills: Proficiency in Python (for ML tooling) and Go (for systems engineering), with a preference for open-source contributions over reinventing solutions

Low-level systems knowledge: Familiarity with Linux internals, RDMA networking, and performance optimization for ML workloads

Research collaboration experience: A track record of working closely with AI researchers or ML engineers to solve infrastructure challenges

Self-directed problem‑solving: The ability to identify bottlenecks, propose solutions, and drive impact in a fast‑paced environment

If some of the above doesn’t line up perfect...

Apply for this Position

Ready to join Cohere? Click the button below to submit your application.

Submit Application

Job Details

Location

toronto, on, Canada

Job Type

Full-time