Site Reliability Engineer

MicrosoftRedmond, WA
1d

About The Position

The IDEAS organization's mission is to unlock the power of data to deliver actionable insights and personalized experiences at scale, thereby driving usage, engagement, and revenue across Microsoft 365, Azure, Windows, and more. As part of the team. you’ll collaborate with teams company-wide, from product engineers to data scientists, using cutting-edge technology (big data platforms, cloud analytics, AI Copilots) to solve complex problems. Specifically, as a Site Reliability Engineer, you will help drive automation, incident response, and data-driven improvements to ensure our services meet stringent reliability and performance goals. You’ll collaborate across engineering teams, contribute to live site operations, and help shape the future of our systems at scale while ensuring that they are secure and compliant. Come build the data future at Microsoft. Joining the IDEAS organization means joining a team that is transforming how Microsoft harnesses data, and in the process, empowering customers and partners with smarter, AI-infused experiences. It’s not just a job – it’s a chance to lead a data revolution from within. If you’re excited by the idea of turning an enterprise’s data into insights, intelligence, and impact, consider applying for the Microsoft IDEAS organization. Microsoft’s mission is to empower every person and every organization on the planet to achieve more. As employees we come together with a growth mindset, innovate to empower others, and collaborate to realize our shared goals. Each day we build on our values of respect, integrity, and accountability to create a culture of inclusion where everyone can thrive at work and beyond.

Requirements

  • Associate's Degree in Computer Science, Information Technology, or related field Bachelor's Degree in Computer Science, Information Technology, or related field OR equivalent experience.
  • Ability to meet Microsoft, customer and/or government security screening requirements are required for this role. These requirements include but are not limited to the following specialized security screenings: Microsoft Cloud Background Check: This position will be required to pass the Microsoft Cloud background check upon hire/transfer and every two years thereafter.

Nice To Haves

  • Bachelor's Degree in Computer Science, Information Technology, or related field AND 2+ years technical experience in software engineering, network engineering, or systems administration OR equivalent experience.
  • 1+ year(s) experience in automating root cause analysis and mitigation of incidents.
  • 1+ year(s) experience with automation, live site operations, and incident response in large-scale cloud or distributed systems.
  • Proven experience coding in at least one programming or scripting language including, but not limited to, C#, Java, Python, or PowerShell
  • Experience using analytical and problem-solving skills, telemetry, and data to drive operational decisions.
  • Proven experience using communication and collaboration skills to work effectively across teams.

Responsibilities

  • Serve as a Designated Responsible Individual (DRI) in a 24x7 on-call rotation, monitoring service health, responding to incidents within SLA timelines, and driving post-incident learnings.
  • Develop, enhance, and maintain automation for deployment, operations, and incident mitigation to improve service reliability and reduce manual intervention.
  • Instrument services for observability, collect and analyze telemetry and health metrics, and use data-driven insights to guide reliability and performance improvements.
  • Collaborate closely with engineering partners and stakeholders to align goals, share operational insights, and deliver user-centric solutions.
  • Apply engineering best practices for development, scaling, and operational excellence to meet performance and customer requirements.
  • Ensure compliance with security, privacy, and accessibility standards throughout service onboarding and operations.
  • Stay current with industry trends and internal tools to continuously improve reliability, performance, and observability at scale.

Stand Out From the Crowd

Upload your resume and get instant feedback on how well it matches this job.

Upload and Match Resume

What This Job Offers

Job Type

Full-time

Career Level

Entry Level

Education Level

Associate degree

Number of Employees

5,001-10,000 employees

© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service