Site Reliability Engineer II - CTJ - Top Secret

MicrosoftAtlanta, GA
206d$100,600 - $199,000Onsite

About The Position

Do you have a passion for high scale services and working with some of Microsoft's most critical customers? We're looking for a Site Reliability Engineer II with the right mix of software development, on-line services experience and passion for quality to envision, design, and deliver Office 365 government cloud service offerings. Office 365 is at the center of Microsoft's cloud first, devices first strategy as it brings together cloud versions of our most trusted communication and collaboration products like Exchange, SharePoint, and Teams with our cross-platform desktop suites and mobile apps. The Office 365 Enterprise Cloud team works with Microsoft's largest enterprise and government customers to deliver features that meet their specific needs and enable cloud adoption. As you would expect, our customers have the highest expectations for feature quality, security, reliability, availability, and performance. The Site Reliability Engineering (SRE) team provides leadership, direction and accountability for application architecture, system design, and end-to-end implementation. As a Site Reliability Engineer, you will identify and deliver software improvements using your expertise in software development, complexity analysis, and scalable system design. Collaboration skills will be required to work closely with other engineering teams to ensure services/systems are highly stable and performant, meeting the expectations of our government customers and users. At Microsoft, we can offer you great teams, exciting challenges, and a fun place to work. The work environment empowers you to have a positive impact on millions of end users.

Requirements

  • Master's Degree in Computer Science, Information Technology, or related field OR Bachelor's Degree in Computer Science, Information Technology, or related field AND 1+ years of technical experience in software engineering, network engineering, or systems administration OR 4+ years of technical experience in software engineering, network engineering, or systems administration.
  • Candidates must have an active Top Secret and be willing to upgrade to TS/SCI (with polygraph).
  • Ability to meet Microsoft, customer and/or government security screening requirements.

Nice To Haves

  • Master's Degree in Computer Science, Information Technology, or related field AND 1+ years of technical experience in software engineering, network engineering, or systems administration OR Bachelor's Degree in Computer Science, Information Technology, or related field AND 2+ years of technical experience in software engineering, network engineering, or systems administration OR 5+ years of technical experience in software engineering, network engineering, or systems administration.

Responsibilities

  • Demonstrates expertise in distributed systems design, interactions between cloud technology layers and components, common dependencies at scale, and the code that defines infrastructures.
  • Identifies and recommends configurations optimal of cloud technology solutions and modifies the code base that defines systems or cloud technologies to improve the reliability and operability of supported products.
  • Develops an understanding of the code, features, and operations of specific products at scale to contribute to incremental improvements in product availability, reliability, efficiency, observability, and/or performance.
  • Participates in on-boarding, code/design reviews, and regular meetings with the engineering teams that develop and/or manage those products.
  • Researches and maintains an awareness in industry trends, advances in distributed systems and cloud technologies, new tools, and/or processes for maintaining and improving product availability, reliability, efficiency, observability, and/or performance.
  • Contributes to the implementation of new solutions within their team by identifying ways they can be applied to solve persistent problems.
  • Independently develops code or scripts that automate the performance of repetitive and easily scalable operations processes.
  • Leverages technical expertise and telemetry analysis across a range of components and/or features to identify patterns and opportunities to implement configuration and data changes.
  • Identifies opportunities to leverage existing tools and automation to enable product engineering teams to increase the velocity in which they can reliably and safely implement changes in production.
  • Designs, develops, and maintains telemetry pipelines and monitoring tools that detail operations metrics of product components and features operating at scale.
  • Independently performs analyses using existing tools and/or models to identify insights and shares them with product engineering teams.
  • Responds to incidents during regular on-call rotations by identifying the level of impact, troubleshooting issues, and deploying appropriate fixes to resolve root cause(s).
  • Develops alerts and instrumentation across components and features to monitor product capacity and resource demands.

Benefits

  • Industry leading healthcare
  • Educational resources
  • Discounts on products and services
  • Savings and investments
  • Maternity and paternity leave
  • Generous time away
  • Giving programs
  • Opportunities to network and connect
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service