Senior Reliability Engineer

Stream Data Centers•Dallas, TX

2d•Hybrid

About The Position

The Reliability Engineering Team is responsible for identifying, understanding, and managing reliability risks that could adversely affect plant or business operations. For each of the major systems, the Reliability Engineering team is responsible for evaluating and improving the reliability and performance of existing critical infrastructure and sustaining equipment operational availability through maintenance program strategies and improvements. Additionally, Reliability Engineering will quantify and measure component, system, and portfolio lifecycle health and provide strategic support to portfolio operations and provide reliability and maintainability feedback to the design and construction teams for future design considerations and optimization efforts.

Requirements

Bachelor’s degree in Engineering or equivalent technical field; advanced degree preferred.
10+ cumulative years of experience with industrial or commercial engineering in Mission Critical facilities
Organized and can set priorities and meet deadlines and budget
Demonstrated ownership of enterprise-level reliability programs
Experience reading, interpreting, and creating construction drawings, specifications, and submittal documents
Ability to carry design concepts through exploration, development, and into deployment/mass production quality standards
Advanced understanding of both mechanical and/or electrical equipment/design related to data centers (Including but not limited to uninterruptable power sources, diesel generators, electrical switchgear, power distribution units, variable frequency drives, automatic/static transfer switches, chillers [air-cooled and water-cooled], pumps, cooling towers, heat exchangers, CRAC/CRAHs, fans, air economizers, water treatment, etc.)
Experience leading RCAs and failure mode & effects analysis
Experience leading/oversite of FWT and FAT
Experience overseeing supplier quality audits
Experience using physics-of-failure approach for analytical and empirical risk identification and assessment
Possess excellent communication and writing skills and attention to detail
Proven success driving MTBF improvement and AFR reduction at scale

Nice To Haves

Professional Engineering (PE) license or progress toward certification.
Experience with largescale data center deployments across multiregional footprints.
12+ years of experience with data centers
Professional Engineer (PE) license
Experience using a variety of web based and other software tools for calculation and data processing
Direct experience with the design, construction, operation, or maintenance of mission critical facilities, especially data centers
Experience as resident engineer or hands-on (in the field) design consultant or owner’s engineer
Knowledge of building codes and regulations for your region
Experience with proactive and effective reliability approaches in a cost-effective manner throughout product design, manufacture and deployment stages
Six Sigma black-belt certification
Experience redesigning/retrofitting critical systems for improved performance/availability

Responsibilities

Drive the Causal Analysis and root-cause program to mitigate recurring equipment failures and develop strategic solutions
Review BOD and design templates provide feedback to Design and Construction team
Provide 360-degree review on new long-lead and critical equipment and vender qualifications
Perform strategic internal audits and assessments, to include driving supplier RCAs and performing vendor business reviews as an input to selection criteria
Provide Industry outreach, research new technologies
Partner with Infrastructure Operations & Availability Team to develop reliability-centered maintenance programs that reduce mechanical, electrical & control systems maintenance complexity and minimize equipment down time
Partner with Field Engineering to research and develop system component upgrades to ensure reliability and combat obsolescence
Develop risk management plans that will anticipate and mitigate reliability-related risks that could adversely impact plant operation
Partner with Field Engineering to identify single points of failure (SPOFs) and develop strategic solutions
Partner with Field Engineering and Site Operations Teams during event response to develop futureproofing mitigations for failures
Review and approve input in the SDC Operations OPR
Partner with Field Engineering to develop technical and field service bulletins to address complex equipment issues and discrepancies.
Provide technical guidance, training, and support for implementation
Influence design and development teams, procurement, and external partners to optimize DC performance and customer availability
Partner with cross-functional departments for compliance related activities