Site Reliability Engineer II

Microsoft•Redmond, WA

153d•$100,600 - $199,000

About The Position

You thrive on solving hard problems at scale, and you're passionate about keeping critical services reliable, performant, and secure. In our team, you'll work on services that millions of customers depend on every day. You'll partner directly with product engineering teams to influence design, improve operations, and drive automation. As a Site Reliability Engineer II, you will help design, build, and run distributed services at global scale. You'll use your software engineering skills to eliminate toil, improve system resiliency, and deliver meaningful telemetry. This opportunity will allow you to accelerate your career growth, learn how to operate complex cloud services at scale, and develop deep expertise in modern reliability engineering practices. Microsoft's mission is to empower every person and every organization on the planet to achieve more. As employees we come together with a growth mindset, innovate to empower others, and collaborate to realize our shared goals. Each day we build on our values of respect, integrity, and accountability to create a culture of inclusion where everyone can thrive at work and beyond.

Requirements

Master's Degree in Computer Science, Information Technology, or related field AND 1+ year(s) technical experience in software engineering, network engineering, or systems administration OR Bachelor's Degree in Computer Science, Information Technology, or related field AND 2+ years technical experience in software engineering, network engineering, or systems administration OR equivalent experience.
2+ years coding skills in languages such as C#, Python and PowerShell.
Ability to meet Microsoft, customer and/or government security screening requirements are required for this role.

Nice To Haves

3+ years coding skills in languages such as C#, Python and PowerShell.
Experience with monitoring, logging, and distributed systems troubleshooting.
Knowledge or hands-on experience in AI/ML systems.
2+ years technical experience working with large-scale cloud or distributed systems.

Responsibilities

Participate in design and code reviews to ensure services are reliable, scalable, and secure.
Operate services through on-call rotations, incident response, and post-mortems.
Partner with product teams to drive improvements in resiliency, cost efficiency, and performance.
Develop automation to reduce manual operations and improve recovery time.
Build and maintain observability (metrics, logs, traces) that drives data-driven engineering decisions.
Contribute to a blameless culture of learning through continuous improvement and knowledge sharing.

Benefits

Industry leading healthcare
Educational resources
Discounts on products and services
Savings and investments
Maternity and paternity leave
Generous time away
Giving programs
Opportunities to network and connect

Stand Out From the Crowd

Upload your resume and get instant feedback on how well it matches this job.

Upload and Match Resume

What This Job Offers

Job Type

Full-time

Career Level

Mid Level

Industry

Professional, Scientific, and Technical Services

Education Level

Master's degree

Site Reliability Engineer II

About The Position

Requirements

Nice To Haves

Responsibilities

Benefits

What This Job Offers

Job Search Resources

Tools

Career Hubs

Guides

Company