Azure is Microsoft’s central cloud infrastructure hosting both public cloud offerings and a wide range of Microsoft-internal cloud-scale services. Cloud computing is a highly competitive and rapidly growing market, and Azure aims to be a leader across its platform and services. Within Azure, the Azure Compute team provides core infrastructure for hosting virtual machines (VMs), containers, and other workloads. Capacity management is a critical discipline in cloud computing. It ensures sufficient capacity across regions, allocation domains, and hardware infrastructure to meet customer demand while optimizing efficiency to avoid overspending and reduce cost of goods sold (COGS) and capital expenditures (CAPEX). At Azure’s scale, managing this balance across the entire compute fleet is complex, where improvements can prevent allocation failures and deliver significant savings. The Azure Compute Capacity and Efficiency (AC2E) team manages all aspects of capacity and efficiency across the fleet. Our primary goal is to provide an automated and optimized tracking and management system. This system, including the Capacity Management Automation System (CMAS), uses advanced algorithms and artificial intelligence (AI) to predict capacity risks and execute mitigation actions directly within the Azure Compute platform. As a member of the team, you will collaborate with engineers, program managers, and data scientists to define business problems and deliver solutions from design to production, influencing strategic decisions that impact capacity and efficiency.
Stand Out From the Crowd
Upload your resume and get instant feedback on how well it matches this job.
Number of Employees
5,001-10,000 employees