Generative AI is transforming how people create, collaborate, and communicate—redefining productivity across Microsoft 365 for customers worldwide. At Microsoft, we operate one of the largest collaboration and productivity platforms in the world, serving hundreds of millions of consumer and enterprise users. Delivering these AI experiences at scale requires solving some of the hardest efficiency challenges in modern AI systems. We are an applied research team focused on advancing efficiency across the AI stack, spanning models, ML frameworks, cloud infrastructure, and hardware. We drive mid- and long-term product innovation through close collaboration with research and product teams across the company. We communicate our research both internally and externally through internal technical reports, academic conference publications, open-source releases, and patents. Beyond producing research, we take responsibility for driving ideas through prototyping, validation, and production, with a bias toward real-world impact. The candidate will work across the full stack—from large-scale serving systems to hardware- and kernel-level optimizations—exploring algorithmic, systems, and hardware/software co-design techniques. Areas of focus include batching, routing, scheduling, caching, endpoint configuration, and GPU architecture–aware optimizations. This role emphasizes end-to-end ownership, with responsibility for identifying high-impact problems and driving research ideas through prototyping, validation, and deployment to deliver measurable customer impact. For more see: https://aka.ms/efficient-ai
Stand Out From the Crowd
Upload your resume and get instant feedback on how well it matches this job.
Job Type
Full-time
Career Level
Mid Level
Education Level
Ph.D. or professional degree