Infrastructure Reliability Engineer

STACK InfrastructureManassas, VA
Hybrid

About The Position

STACK INFRASTRUCTURE is seeking an Infrastructure Reliability Engineer with expertise in electrical systems to join their Critical Operations team. This role is crucial for ensuring the continuous performance, resilience, and advancement of infrastructure systems across STACK’s portfolio. The position demands in-depth technical knowledge of data center power and cooling systems, a meticulous approach to failure analysis, and a proactive strategy for risk mitigation. The engineer will lead investigations into electrical infrastructure failures, evaluate system performance under various conditions, collaborate with OEMs and vendors, and contribute to the evolution of maintenance standards and asset strategies. Additionally, the role involves contributing to availability reporting, improving event response, and monitoring risk trends to meet SLA commitments. The engineer will also influence the design review and turnover process, develop strategies to mitigate system failures, and partner with Operations, Engineering, and Construction to assess electrical design assumptions and identify long-term reliability risks.

Requirements

  • 5–8 years of experience in critical infrastructure environments (e.g., data centers, substations, power generation, or utility systems)
  • Strong technical fluency in mission-critical electrical systems, including power distribution architecture, UPS systems, generators, grounding methodologies, protective relays, switchgear, controls integration, and power quality analysis
  • Experience analyzing electrical failures through waveform data, event logs, relay coordination, commissioning findings, or forensic troubleshooting
  • Working knowledge of electrical system design intent versus operational field realities, including maintainability, equipment compatibility, and fault response
  • Hands-on experience with root cause analysis and reliability methodologies (e.g., FMEA, RCM)
  • Demonstrated ability to work across disciplines (Ops, Eng, Vendors, Construction) to resolve complex technical issues
  • Expertise with commissioning (Cx) and infrastructure design review processes
  • Ability to analyze performance data and translate findings into practical improvements
  • Bachelor's degree in Engineering or equivalent experience with high technical competency
  • Must be eligible to work in the United States
  • Must pass a comprehensive background screening

Nice To Haves

  • Strong communicator, persuasive and clear, blending analytics with experience in decision-making.
  • Ability to handle multiple priorities while balancing urgent requests with shifting timelines and deliverables.
  • Team builder, understanding and developing strengths of resources while formulating long-term plans for team growth and success.
  • Naturally curious and driven toward continual improvement, analyzing successes for future learning.

Responsibilities

  • Lead deep-dive investigations and RCAs for electrical infrastructure failures, including UPS systems, switchgear, breakers, relays, generators, grounding systems, STS behavior, VFD interactions, controls, and power quality disturbances
  • Evaluate electrical system performance under abnormal, fault, or degraded conditions (e.g., grounding faults, harmonics, transient events, protective coordination, voltage distortion, transfer events) to identify systemic vulnerabilities
  • Engage OEMs and vendors to challenge technical assumptions and advocate for long-term improvements
  • Support the evolution of maintenance standards and asset strategy for high-risk or complex systems (e.g., power distribution, cooling)
  • Collaborate with Workforce Development to enhance technical training for site teams based on lessons from event investigations
  • Contribute to availability reporting, event response improvement, and risk trend monitoring to ensure SLA commitments are met
  • Inform and influence the design review and turnover process by identifying gaps in infrastructure handoffs, system limitations, or commissioning practices
  • Develop system-level failure mode mitigation strategies that improve uptime performance and reduce repeat incidents
  • Partner with Operations, Engineering and Construction to review electrical design assumptions, protective schemes, equipment compatibility, and commissioning practices to identify long-term reliability risks prior to or following operational events

Benefits

  • Healthcare
  • Dental Care
  • Vision Insurance
  • Life Insurance
  • Paid Time Off
  • Paid Leave Programs
  • 401K program
  • Flexible spending accounts
  • Cell phone subsidy
© 2026 Teal Labs, Inc
Privacy PolicyTerms of Service