Senior Reliability Engineer

Stream Data CentersDallas, TX
Hybrid

About The Position

The Reliability Engineering Team is responsible for identifying, understanding, and managing reliability risks that could adversely affect plant or business operations. For each of the major systems, the Reliability Engineering team is responsible for evaluating and improving the reliability and performance of existing critical infrastructure and sustaining equipment operational availability through maintenance program strategies and improvements. Additionally, Reliability Engineering will quantify and measure component, system, and portfolio lifecycle health and provide strategic support to portfolio operations and provide reliability and maintainability feedback to the design and construction teams for future design considerations and optimization efforts.

Requirements

  • Bachelor’s degree in Engineering or equivalent technical field; advanced degree preferred.
  • 10+ cumulative years of experience with industrial or commercial engineering in Mission Critical facilities
  • Organized and can set priorities and meet deadlines and budget
  • Demonstrated ownership of enterprise-level reliability programs
  • Experience reading, interpreting, and creating construction drawings, specifications, and submittal documents
  • Ability to carry design concepts through exploration, development, and into deployment/mass production quality standards
  • Advanced understanding of both mechanical and/or electrical equipment/design related to data centers (Including but not limited to uninterruptable power sources, diesel generators, electrical switchgear, power distribution units, variable frequency drives, automatic/static transfer switches, chillers [air-cooled and water-cooled], pumps, cooling towers, heat exchangers, CRAC/CRAHs, fans, air economizers, water treatment, etc.)
  • Experience leading RCAs and failure mode & effects analysis
  • Experience leading/oversite of FWT and FAT
  • Experience overseeing supplier quality audits
  • Experience using physics-of-failure approach for analytical and empirical risk identification and assessment
  • Possess excellent communication and writing skills and attention to detail
  • Proven success driving MTBF improvement and AFR reduction at scale

Nice To Haves

  • Professional Engineering (PE) license or progress toward certification.
  • Experience with largescale data center deployments across multiregional footprints.
  • 12+ years of experience with data centers
  • Professional Engineer (PE) license
  • Experience using a variety of web based and other software tools for calculation and data processing
  • Direct experience with the design, construction, operation, or maintenance of mission critical facilities, especially data centers
  • Experience as resident engineer or hands-on (in the field) design consultant or owner’s engineer
  • Knowledge of building codes and regulations for your region
  • Experience with proactive and effective reliability approaches in a cost-effective manner throughout product design, manufacture and deployment stages
  • Six Sigma black-belt certification
  • Experience redesigning/retrofitting critical systems for improved performance/availability

Responsibilities

  • Drive the Causal Analysis and root-cause program to mitigate recurring equipment failures and develop strategic solutions
  • Review BOD and design templates provide feedback to Design and Construction team
  • Provide 360-degree review on new long-lead and critical equipment and vender qualifications
  • Perform strategic internal audits and assessments, to include driving supplier RCAs and performing vendor business reviews as an input to selection criteria
  • Provide Industry outreach, research new technologies
  • Partner with Infrastructure Operations & Availability Team to develop reliability-centered maintenance programs that reduce mechanical, electrical & control systems maintenance complexity and minimize equipment down time
  • Partner with Field Engineering to research and develop system component upgrades to ensure reliability and combat obsolescence
  • Develop risk management plans that will anticipate and mitigate reliability-related risks that could adversely impact plant operation
  • Partner with Field Engineering to identify single points of failure (SPOFs) and develop strategic solutions
  • Partner with Field Engineering and Site Operations Teams during event response to develop futureproofing mitigations for failures
  • Review and approve input in the SDC Operations OPR
  • Partner with Field Engineering to develop technical and field service bulletins to address complex equipment issues and discrepancies.
  • Provide technical guidance, training, and support for implementation
  • Influence design and development teams, procurement, and external partners to optimize DC performance and customer availability
  • Partner with cross-functional departments for compliance related activities

Benefits

  • Health Care Plan (Medical, Dental & Vision)
  • Retirement Plan (401k, IRA)
  • Life Insurance (Basic, Voluntary & AD&D)
  • Paid Time Off (Vacation, Sick & Public Holidays)
  • Family Leave (Maternity, Paternity)
  • Short Term & Long Term Disability
  • Training & Development
  • Wellness Resources
  • annual bonus
  • flexible time off (vacation)
  • 401k
© 2026 Teal Labs, Inc
Privacy PolicyTerms of Service