Senior SRE: AI/ML HPC Infra & GPU Cluster

Boson AI

📍 Toronto, ON, Canada

Full-time Other-General Posted February 03, 2026

Apply Now Similar Jobs

Job Description

                        A leading technology company in Toronto is seeking a Senior Site Reliability Engineer. You will manage one of the most advanced GPU clusters and operate across the full lifecycle of HPC infrastructure. Key responsibilities include troubleshooting, optimizing, and automating processes to ensure high performance. The ideal candidate should have over 5 years of SRE or HPC operations experience, including proficiency in Linux and Kubernetes. This role offers an exciting opportunity to work with cutting-edge technology in a collaborative environment.
#J-18808-Ljbffr
                    

Apply for this Position

Ready to join Boson AI? Click the button below to submit your application.

Submit Application

Job Details

Location

Toronto, ON, Canada

Job Type

Full-time