M365 Copilot inference is a high-impact engineering team advancing applied AI and large-scale machine learning across Microsoft. We design and operate the platform powering Microsoft 365 Copilot experiences, delivering intelligent capabilities to millions of users. Our team owns one of the world’s largest AI inference platforms, operating at massive GPU (Graphics Processing Unit) scale across global datacenters. We build the core LLM (large language model) API (Application Programming Interface) and routing services that enable low-latency, highly available AI experiences, and continuously push the boundaries of performance, scalability, and efficiency. As a Principal Software Engineering Manager you will lead a strategic initiative focused on maximizing throughput per GPU across the Copilot inference stack. This role is to drive inference engine efficiency by optimizing model execution and runtime performance, improving throughput per GPU, reducing cost per query, and unlocking capacity without additional hardware investment. This role is based out of Redmond, WA and employees are expected to work from a designated Microsoft office at least three days a week. Microsoft’s mission is to empower every person and every organization on the planet to achieve more. As employees we come together with a growth mindset, innovate to empower others, and collaborate to realize our shared goals. Each day we build on our values of respect, integrity, and accountability to create a culture of inclusion where everyone can thrive at work and beyond.
Stand Out From the Crowd
Upload your resume and get instant feedback on how well it matches this job.
Job Type
Full-time
Career Level
Principal