Microsoft Digital (MSD) builds and manages the critical products and services that Microsoft runs on. We boldly pursue big ideas that power transformational advances at Microsoft and for our customers, while helping Microsoft teams work smarter, faster, and more securely every day. Microsoft Digital employees have deep technical and business expertise, customer insights, and a clear point of view that comes from first-hand, large-scale experience with Microsoft and industry solutions. We are engineers, technology leaders and experts, digital transformation change agents, and customer advocates. We are seeking a Principal Service Reliability Engineer (SRE) to lead the reliability strategy for mission-critical, large-scale distributed systems. This role operates at a system and organizational level, driving reliability engineering practices across services, influencing architecture decisions, and establishing scalable frameworks for availability, performance, and operational excellence. The Principal SRE defines reliability standards (SLOs/SLIs/error budgets), and partners with engineering, product, and platform teams to design, build, and operate resilient systems at enterprise scale. This role is accountable for reducing systemic risk, eliminating operational toil, and advancing toward autonomous, self-healing platforms. Microsoft’s mission is to empower every person and every organization on the planet to achieve more. As employees we come together with a growth mindset, innovate to empower others, and collaborate to realize our shared goals. Each day we build on our values of respect, integrity, and accountability to create a culture of inclusion where everyone can thrive at work and beyond. #MSD #MSDJOBS
Stand Out From the Crowd
Upload your resume and get instant feedback on how well it matches this job.
Job Type
Full-time
Career Level
Principal
Education Level
No Education Listed