CoreAI is at the forefront of Microsoft’s mission to redefine how software is built and experienced. We are responsible for building the foundational platforms, services, programming models, and developer experiences that power the next generation of applications using Generative AI. Our work enables developers and enterprises to harness the full potential of AI to create intelligent, adaptive, and transformative software. The AI Core Infrastructure team, part of AI Platform team in CoreAI Organization is responsible for large-scale, highly reliable and efficient GPU management infrastructure and the inference and training platforms that power all of Microsoft’s AI workloads, such as M365 CoPilot, Github CoPilot, Microsoft CoPilot, AI Foundry’s Inference and Fine-Tuning offering of OAI and OSS models, and many more. As a Principal Engineer on the Observability team, you’ll shape the architecture and strategy on how customers monitor, troubleshoot, and scale their AI training workloads. You’ll work across ML infrastructure, distributed systems, and observability to power large-scale pre-training, post-training, and fine-tuning on some of the world’s largest AI supercomputers. Microsoft’s mission is to empower every person and every organization on the planet to achieve more. As employees we come together with a growth mindset, innovate to empower others, and collaborate to realize our shared goals. Each day we build on our values of respect, integrity, and accountability to create a culture of inclusion where everyone can thrive at work and beyond. In alignment with our Microsoft values, we are committed to cultivating an inclusive work environment for all employees to positively impact our culture every day.
Stand Out From the Crowd
Upload your resume and get instant feedback on how well it matches this job.
Job Type
Full-time
Career Level
Mid Level
Number of Employees
5,001-10,000 employees