About The Position

Microsoft’s Cloud Operations & Innovation (CO+I) is the engine that powers our cloud services. At its core, datacenter availability isn't just a metric, but a promise of continuity. It is imperative to identify availability improvements & opportunities across Microsoft datacenters. This goal will continually allow our operational cloud to scale in a safe, secure, and reliable manner for our customers. The Continuous Evaluation Program (CEP) is a strategic initiative within Microsoft’s global datacenter operations, designed to systematically assess, monitor, and optimize the ongoing operational readiness of our infrastructure. CEP plays a critical role in strengthening Microsoft’s availability, business reputation, and customer experience by proactively identifying risks, mitigating exposure, and driving consistency in operational excellence. As we accelerate our speed to market, CEP ensures scalable, reliable, and high-quality solutions through continuous evaluation and behavioral influence. Key Program Focus Areas: Availability: Provides impartial assessments of operational readiness across the datacenter fleet, ensuring consistent uptime and performance. Standardized Evaluation Framework: Utilizes clear, measurable benchmarks derived from Microsoft’s datacenter operational standards to guide ongoing site evaluations. Data-Driven Risk Mitigation: Leverages historical data to identify patterns in equipment and system failures, enabling proactive risk identification and elimination. Scalable Operational Processes: Implements optimized and standardized procedures that support rapid growth without compromising reliability or quality. Proactive Issue Detection: Identifies potential risks early, with a goal to prevent disruptions and ensure higher availability across operational sites. Culture of Continuous Improvement: Promotes innovation and agility, ensuring Microsoft’s datacenter infrastructure remains resilient, adaptable, and future-ready.

Requirements

  • Doctorate Degree in Mechanical Engineering, Materials Engineering, Reliability Engineering, Electrical Engineering, or related field AND 2+ years technical engineering experience OR Master's Degree in Mechanical Engineering, Materials Engineering, Reliability Engineering, Electrical Engineering, or related field AND 4+ years technical engineering experience OR Bachelor's Degree in Mechanical Engineering, Materials Engineering, Reliability Engineering, Electrical Engineering, or related field AND 5+ years technical engineering experience OR 12+ years relevant technical engineering experience.
  • Ability to meet Microsoft, customer and/or government security screening requirements are required for this role.
  • This position will be required to pass the Microsoft Cloud background check upon hire/transfer and every two years thereafter.
  • This position requires verification of US Citizenship to meet federal government security requirements.

Nice To Haves

  • Doctorate Degree in Mechanical Engineering, Materials Engineering, Reliability Engineering, Electrical Engineering, or related field AND 4+ years technical engineering experience OR Master's Degree in Mechanical Engineering, Materials Engineering, Reliability Engineering, Electrical Engineering, or related field AND 7+ years technical engineering experience OR Bachelor's Degree in Mechanical Engineering, Materials Engineering, Reliability Engineering, Electrical Engineering, or related field AND 9+ years technical engineering experience.

Responsibilities

  • Align with Microsoft’s culture, objectives and operational standards.
  • Deliver a best-in-class, objective and impartial evaluation program monitoring Microsoft’s datacenter infrastructure, operational capabilities and performance against our standards, best practices and programs.
  • Drive global consistency of processes, procedures, and reporting with local operations teams.
  • Develop methodologies and metrics to validate data center performance, system control parameters and operational efficiency against design intent.
  • Support Microsoft’s datacenter portfolio expansion to include new country and facility onboarding through operational and site risk reviews.
  • Manage programs associated with operational readiness.
  • Review compliance with existing corrective and preventative maintenance programs to enhance operational readiness.
  • Evolve operational excellence with key focus areas of risk management, uptime availability and safety.
  • Focus on improved environmental performance, compliance, and risk management.
  • Support and promote improvement, best practices, corrective and preventive actions
  • Engages with appropriate partner teams to support initiatives, tasks or projects.
  • Establish strong working relationships and engagement with our Engineering Groups (EGs), key partners and Landlord partners (including contributing to MBRs and QBRs)
  • Work with regional and global peers to share and build best practices across the entire datacenter portfolio.
  • Partner with regional operational leadership and local teams to reduce high-impacting and human-error Critical Environment (CE) incidents year over year.
  • Monitor and verify the implementation and effectiveness of remediation action plans.
  • Create an environment to promote learning and innovation opportunities.
  • Obtain a clear understanding of Microsoft’s day-to-day operation, management and maintenance expectations for all critical equipment, controls and processes including (but not limited to), operating procedures, standards, change management and drills.
  • Develop methodologies and metrics to validate datacenter performance, system control parameters and operational efficiency against design intent.
  • Support Microsoft’s datacenter portfolio expansion to include new country and facility onboarding through operational and site risk reviews.
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service