Principal Software Engineering Manager

MicrosoftRedmond, WA
$142,800 - $304,200Hybrid

About The Position

M365 Copilot Inference is a high-impact engineering team advancing applied AI and large-scale machine learning across Microsoft. The team designs and operates the platform powering Microsoft 365 Copilot experiences, running at massive GPU (Graphics Processing Unit) scale across multiple regions and SKUs in global datacenters. It builds core LLM (large language model) API (Application Programming Interface), routing, and capacity control plane services to deliver low-latency, highly available Copilot experiences. We’re hiring a Principal Software Engineering Manager to lead a team focused on control plane automations for capacity buildout. This is a hands-on technical leadership role centered on how Copilot capacity is requested, planned, deployed, and operated. The manager will contribute to capacity planning and custom model deployment automation, partnering closely with peer managers and adjacent areas to shape how the broader control plane evolves. The space spans intake, planning, deployment, fleet health, and unified control plane surfaces. This role is based out of Redmond, WA and employees are expected to work from a designated Microsoft office at least three days a week. Microsoft’s mission is to empower every person and every organization on the planet to achieve more. As employees we come together with a growth mindset, innovate to empower others, and collaborate to realize our shared goals. Each day we build on our values of respect, integrity, and accountability to create a culture of inclusion where everyone can thrive at work and beyond.

Requirements

  • Bachelor's Degree in Computer Science or related technical field AND 6+ years technical engineering experience with coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, or Python OR equivalent experience.
  • Ability to meet Microsoft, customer and/or government security screening requirements are required for this role.
  • This position will be required to pass the Microsoft Cloud background check upon hire/transfer and every two years thereafter.

Nice To Haves

  • Master's Degree in Computer Science or related technical field AND 8+ years technical engineering experience with coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, or Python OR Bachelor's Degree in Computer Science or related technical field AND 12+ years technical engineering experience with coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, or Python OR equivalent experience.
  • 4+ years people management experience.
  • Experience as an engineering manager leading IC (individual contributor) teams building distributed systems, platform services, or cloud infrastructure at scale.
  • Technical depth — able to participate in design reviews, debug live-site issues, and raise the engineering bar through code and design feedback.
  • Track record shipping production services with live-site and on-call ownership.
  • Experience building automation and tooling that replaces manual operational work.
  • Ability to work across team and org boundaries to align on dependencies, surface trade-offs, and drive execution.
  • Hiring, coaching, and people-development track record.
  • Ability to take an ambiguous charter and turn it into a focused roadmap with clear priorities.
  • Experience with AI/ML infrastructure, GPU fleets, or large-scale inference or training systems.
  • Experience with capacity planning, fleet management, or supply/demand optimization at scale.
  • Familiarity with Azure, M365, or AI workload cost models (COGS, utilization, throughput).
  • Background building control planes, orchestration platforms, or automation systems from 0→1.
  • Experience hiring and growing IC teams in a high-growth platform org.

Responsibilities

  • Lead and grow a team of software engineers building control plane services and automations across the capacity buildout area.
  • Drive technical design and execution for capacity automation — intake, planning, deployment, fleet health, and control plane components — prioritizing the highest-impact work for Copilot capacity.
  • Replace manual, ticket-driven capacity workflows with automated, data-driven systems; reduce time from capacity request to production traffic for priority workloads.
  • Own live-site, reliability, and operational excellence for the services your team builds; establish SLAs, metrics, and on-call practices.
  • Partner with peer engineering managers on adjacent capacity areas, and with partner teams across M365 Core, AI Core, Azure, and Microsoft Research to align on dependencies and unblock execution.
  • Coach and grow senior and mid-level engineers; raise the engineering bar; recruit strong platform talent into the team.
  • Help shape how the capacity automation area is sliced and scoped over time as the platform and the org evolve.

Benefits

  • Certain roles may be eligible for benefits and other compensation.
© 2026 Teal Labs, Inc
Privacy PolicyTerms of Service