Site Reliability Engineer - CTJ - Poly

Microsoft•Redmond, WA

32d•Hybrid

About The Position

The Silver Edge team brings the power of Azure to the edge for our customers, tackling some of the most complex and mission-critical challenges in cloud and edge computing. Our mission is to provide stellar customer service so that their mission can succeed. We support the new Azure Local product that brings cloud computing to local hardware. We’re looking for a creative, hands-on leader who loves to mentor, relishes solving complex, ambiguous problems at scale and is passionate about building resilient systems that matter. As a Principal Site Reliability Engineer on the Silver Edge team, you will provide technical leadership in building out and ensuring the dependability of Azure Local services in 3 different sovereign clouds. You will be required to solve tough technical problems, coach a talented team of early-career engineers, and thrive in dynamic, sometimes chaotic environments. You will be responsible for coordinating with numerous engineering teams to resolve issues and providing input into how to improve our products and services. In this role, you will accelerate your career, deepen your expertise in sovereign cloud solutions and help shape the future of Azure edge solutions. We offer flexible work arrangements, including partial remote options, to support your best work. Microsoft’s mission is to empower every person and every organization on the planet to achieve more. As employees we come together with a growth mindset, innovate to empower others, and collaborate to realize our shared goals. Each day we build on our values of respect, integrity, and accountability to create a culture of inclusion where everyone can thrive at work and beyond.

Requirements

Doctorate Degree in Computer Science, Information Technology, or related field AND 2+ years technical experience in software engineering, network engineering, or systems administration OR Master's Degree in Computer Science, Information Technology, or related field AND 3+ years technical experience in software engineering, network engineering, or systems administration OR Bachelor's Degree in Computer Science, Information Technology, or related field AND 5+ years technical experience in software engineering, network engineering, or systems administration OR equivalent experience.
Security Clearance Requirements: Candidates must be able to meet Microsoft, customer and/or government security screening requirements are required for this role. These requirements include, but are not limited to the following specialized security screenings: The successful candidate must have an active U.S. Government Top Secret Clearance with access to Sensitive Compartmented Information (SCI) based on a Single Scope Background Investigation (SSBI) with Polygraph. Ability to meet Microsoft, customer and/or government security screening requirements are required for this role. Failure to maintain or obtain the appropriate U.S. Government clearance and/or customer screening requirements may result in employment action up to and including termination.
Clearance Verification: This position requires successful verification of the stated security clearance to meet federal government customer requirements. You will be asked to provide clearance verification information prior to an offer of employment.
Microsoft Cloud Background Check: This position will be required to pass the Microsoft Cloud background check upon hire/transfer and every two years thereafter.
Citizenship & Citizenship Verification: This position requires verification of U.S. citizenship due to citizenship-based legal restrictions. Specifically, this position supports United States federal, state, and/or local United States government agency customer and is subject to certain citizenship-based restrictions where required or permitted by applicable law. To meet this legal requirement, citizenship will be verified via a valid passport, or other approved documents, or verified US government Clearance

Nice To Haves

Doctorate Degree in Computer Science, Information Technology, or related field AND 5+ years technical experience in software engineering, network engineering, or systems administration OR Master's Degree in Computer Science, Information Technology, or related field AND 8+ years technical experience in software engineering, network engineering, or systems administration OR Bachelor's Degree in Computer Science, Information Technology, or related field AND 12+ years technical experience in software engineering, network engineering, or systems administration OR equivalent experience.
7+ years technical experience working with large-scale cloud or distributed systems.
5+ years of experience with managing reliability of mission critical production workloads which requires coordinating with number of partners and teams
5+ years of programming experience with Powershell, Python, C#, or similar
5+ years experience in troubleshooting/analyzing logs and metrics
5+ years experience administering Windows Server (2016+) Hyper-V, storage, networking, and security capabilities

Responsibilities

You lead a team that maintains Azure Local Cloud Services reliability including deployment, availability, security, performance and customer satisfaction for sovereign environments
You collaborate with Engineering and Program Management partners to proactively identify and reduce customer issues through design, testing and implementation of software-based solutions
You expand end-to-end technical expertise in the architecture, code, features, operations, and comprehensive use scenarios of products to drive continuous improvements. You drive teams across organizations to develop solutions to intelligently identify contributing factors and points of failure affecting availability, reliability, and performance of the product.
You support customer deployments and use of Azure Local and Azure Local disconnected operations
You participate in on-call rotations and ensure teams across organizations are equipped to respond to incidents during regular on-call rotations. You keep relevant stakeholders and leadership apprised of details related to incident impact and status of resolution.
You mentor a highly skilled team of engineers driving their growth and development, helping them execute projects that deliver revolutionary improvements to the cloud and scale them

Stand Out From the Crowd

Upload your resume and get instant feedback on how well it matches this job.

Upload and Match Resume