Job Description

**Overview**

The Azure Compute team builds a fault-tolerant, distributed system on top of commodity datacenter hardware to deliver infrastructure for hosting cloud applications in virtual machines (VMs). The team creates the illusion that resources are limitless, infinitely elastic, and always available.

This role is part of the Availability Platform team within Azure Compute, which focuses on ensuring every Azure virtual machine achieves a Service Level Agreement (SLA) of 99.99 percent or higher. Meeting this target requires innovative thinking, data-driven decisions, and intelligent automation. The team owns services that monitor the health of millions of Azure machines and the control plane services that make repair decisions. We use artificial intelligence (AI) and machine learning to build predictive failure models that proactively live-migrate virtual machines before failures occur, minimizing customer impact and improving platform resilience.

We are ex...

Apply for this Position

Ready to join Microsoft Corporation? Click the button below to submit your application.

Submit Application