Critical Environment Ops Technician

MicrosoftSanta Clara, CA
Onsite

About The Position

As a Critical Environment Technician in Microsoft’s Cloud Operations & Innovation (CO+I) team, you will be responsible for maintaining the critical infrastructure that ensures the continuous operation of Microsoft's Datacenters. This role involves coordinating with suppliers and vendors, identifying safe versus unsafe work environments, possessing a hands-on understanding of critical environment equipment, performing various types of maintenance, responding to onsite incidents while collaborating with other critical facilities professionals, and utilizing telemetry and other platforms to monitor equipment performance and operations. Microsoft’s CO+I group is dedicated to powering cloud services for products like Bing, Office 365, Xbox, OneDrive, and the Microsoft Azure platform. The team emphasizes personal and professional development, offering training, career rotation programs, diversity & inclusion events, and professional certifications. The infrastructure managed by CO+I includes over 200 datacenters globally, supporting billions of customers and millions of businesses. The company prioritizes environmental sustainability and optimization in its datacenter design and operations. Microsoft's mission is to empower individuals and organizations, fostering a culture of growth, innovation, respect, integrity, and accountability. This role may require business travel (0-25% of the time) to support other metros and the ability to work 12-hour shifts, which may include evenings, nights, weekends, and/or holidays.

Requirements

  • High School Diploma, GED, or equivalent
  • 1+ year(s) mission critical services work/applied learning experience (e.g., high availability assembly/manufacturing/critical infrastructure environments such as data centers, oil and gas refineries, hospitals, pharmaceutical, manufacturing, or related fields), or equivalent experience.
  • Ability to meet Microsoft, customer and/or government security screening requirements are required for this role.
  • Ability to pass the Microsoft Cloud background check upon hire/transfer and every two years thereafter.
  • Citizenship verification (requires verification of citizenship with a valid passport due to supporting United States federal, state, and/or local government agency customers).

Nice To Haves

  • Associate's Degree or technical trade certification (e.g., military, trade school), or higher-equivalent education AND 2+ years mission-critical services experience (e.g., high-availability assembly/manufacturing/critical infrastructure environments such as data centers, oil and gas refineries, hospitals, pharmaceutical, manufacturing, or related fields) OR High School Diploma, GED, or equivalent AND 3+ years mission critical services experience (e.g., high-availability assembly/manufacturing/critical infrastructure environments such as data centers, oil and gas refineries, hospitals, pharmaceutical, manufacturing, or related fields) OR equivalent experience.

Responsibilities

  • Understands, follows, and ensures safety and security requirements (e.g., job hazard assessments [JHAs], toolbox talks), and business processes and procedures are met, to properly perform work in a safe, quality, and reliable manner in accordance to applicable Authority Having Jurisdiction (AHJ) regulations, and Microsoft requirements.
  • Recognizes safe versus unsafe working conditions and responds accordingly (e.g., stop/pause tasks, stand down vendors where necessary).
  • Escalates immediately when unsafe working conditions are observed and promotes a safe working culture to empower less experienced team members.
  • Participates in required meetings, trainings, and necessary handoffs.
  • Assesses and identifies appropriate resources and equipment necessary to fully support environmental health and safety (EHS) objectives.
  • Actively maintains safe working conditions at all times.
  • Proactively ensures safety and security requirements are followed and met for the work of themselves and others.
  • Prepares and submits required reports (e.g., turnover, preventative maintenance [PM]) as assigned following preexisting scripts and templates, or using ad hoc methods required to support trending and analysis (e.g., Maintenance data, equipment trending data).
  • Develops methods of operating procedure (MOPs), standard operating procedures (SOPs), and/or digital methods of operating procedures (DMOPs) for devices and disciplines within their coverage and expertise to ensure safe and reliable execution.
  • Documents completed work using approved tools and procedural templates for more experienced technician review.
  • Completes and provides coaching to support less experienced technicians for mandatory, technical, and procedural training assignments.
  • Analyzes findings from reports and documents observations.
  • Performs various types of maintenance (e.g., planned, predictive, corrective) and repairs following methods of procedure (MOPs), standard operating procedures (SOPs), and digital methods of operating procedures (DMOPs) for multiple disciplines and one or more types of equipment (e.g., electrical, mechanical, cooling systems) with minimal supervision - in consideration of Task Hazard Analysis (THA), Method Statement of Work (MSOW), or varying permit requirements.
  • Communicates and/or escalates maintenance activities per established process and procedure.
  • Prioritizes maintenance activities as required and/or appropriate.
  • Documents tasks or issues during maintenance activities within appropriate systems per process and procedure as needed.
  • Performs maintenance tasks and repairs that can be performed with minimal oversight.
  • Follows recommended maintenance schedules.
  • Oversees maintenance tasks within a single discipline or area of expertise.
  • Maintains all systems and equipment in a safe and professional manner and understands levels of risk (LORs) associated with varying types of maintenance with established mastery of maintaining systems of a specific discipline.
  • Provides necessary escort to third-party contractors, sub-contractors, vendors, and service providers on site based on the appropriate procedure levels of risk (LOR).
  • Takes part in getting third-party work underway (e.g., making sure systems are properly energized/deenergized), ensuring the work is started and completed in a safe manner in accordance with standard practices, procedures, and Authority Having Jurisdiction (AHJ) regulations.
  • Ensures work performed by suppliers/vendors is performed to scope, all documentation is performed correctly, and escalates as appropriate.
  • Recognizes circumstances when to stop supplier/vendor work to address potential and/or identified concerns.
  • Coordinates per appropriate LOR applicable to preventative and/or corrective maintenance.
  • Identifies and recommends procedure corrections if/when errors are detected or when appropriate.
  • Coordinates and schedules supplier/vendor on-site activities.
  • Reviews and completes appropriate work orders to support approval of vendor supplier field service reports or invoices.
  • Processes method statement of work (MSOW) documents.
  • Coordinates activities and associated schedules with contractors.
  • Performs inspections of equipment in a facility.
  • Participates in testing and commissioning activities.
  • Documents issues found in troubleshooting process within appropriate systems per process and procedure as needed.
  • Ensures equipment and system settings are consistent with established parameters and designs.
  • Determines when troubleshooting efforts are deemed adequate and communicates or escalates to suppliers, engineers, or more experienced colleagues as needed.
  • Has a hands-on understanding of how equipment works within disciplines they have been trained and how to troubleshoot to a subsystem level.
  • Provides consultation to less experienced colleagues with troubleshooting systems and problems.
  • May lead efforts to troubleshoot issues and identify root causes.
  • Works on advanced tasks (e.g., vendor contact, escalations) independently.
  • Serves as a subject matter expert in critical environments-related systems within the data center, and advises less experienced colleagues on such topics.
  • Possesses an understanding of and operates equipment and systems within a set discipline (e.g., electrical, mechanical, controls) with knowledge of the interactions between them and overall operation of a data center.
  • Operates all systems and equipment in a safe and professional manner.
  • Inspects and supervises critical environment-related facility equipment (e.g., controls, heating, ventilation, and air conditioning [HVAC], mechanical systems), building, and grounds for unsafe or abnormal conditions to develop and analyze trends.
  • Understands critical system alarms for single discipline(s) of equipment, their meanings, and engages with appropriate escalation processes or procedures.
  • Recognizes circumstances where execution would be considered safe to proceed.
  • Performs various inspections and validations of equipment performance.
  • Monitors the performance from central monitoring locations (i.e., Facility Operations Centers) of maintenance and operations of equipment (e.g., electrical, mechanical, fire/life safety) within the data center.
  • Escalates per applicable policies and standards.
  • Utilizes telemetry, control systems, and other platforms to monitor site status, analyze past and current events, as well as other processes, and can identify all alarms.
  • Uses technical expertise, prior experience, and device analytics to recognize trends with equipment behavior and checks potential issues as they arise.
  • Advises less experienced colleagues on issues found while monitoring applicable CE systems.
  • Performs all monitoring equipment repair, replacement, and maintenance work, which meets or exceeds Microsoft Service Level Agreement (SLA) requirements.
  • Utilizes internal computerized maintenance management system (CMMS) to track all equipment assets and to complete work order requests for maintenance work.
  • Tracks hours for performed tasks within applicable task management systems.
  • Adds required data, documents, logs changes, and upkeeps procedures related to building management systems and reports.
  • Properly signals spare equipment and parts utilization within maintenance work orders.
  • Safety and quickly responds to and leads an onsite incident response team for all abnormal conditions that impact operations, and coordinates with other critical facilities professionals to perform corrective repairs, without supervision.
  • Gathers necessary information and creates incident timelines/data, root-cause analyses, and/or action items following an abnormal condition as required.
  • Identifies and contacts/engages appropriate parties to mitigate incidents as they occur.
  • Develops new or follows preexisting emergency operating procedures (EOPs), methods of procedure (MOPs), standard operating procedures (SOPs), and digital methods of operating procedures (DMOPs) in relation to incidents.
  • Directly provides emergency monitoring response to irregular or malfunctioning conditions.

Benefits

  • Certain roles may be eligible for benefits and other compensation. Find additional benefits and pay information here: https://careers.microsoft.com/us/en/us-corporate-pay

Stand Out From the Crowd

Upload your resume and get instant feedback on how well it matches this job.

Upload and Match Resume

What This Job Offers

Job Type

Full-time

Career Level

Mid Level

Education Level

High school or GED

Number of Employees

5,001-10,000 employees

© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service