Job Description
A leading technology company in Toronto is seeking a Senior Site Reliability Engineer. You will manage one of the most advanced GPU clusters and operate across the full lifecycle of HPC infrastructure. Key responsibilities include troubleshooting, optimizing, and automating processes to ensure high performance. The ideal candidate should have over 5 years of SRE or HPC operations experience, including proficiency in Linux and Kubernetes. This role offers an exciting opportunity to work with cutting-edge technology in a collaborative environment.
#J-18808-Ljbffr
#J-18808-Ljbffr
Apply for this Position
Ready to join Boson AI? Click the button below to submit your application.
Submit Application