IOC Incident Commander

IRENDallas, TX
Onsite

About The Position

IREN is a leading AI Cloud Service Provider, delivering large-scale GPU clusters for AI training and inference. IREN’s vertically integrated platform is underpinned by its expansive portfolio of grid-connected land and data centers in renewable-rich regions across the U.S. and Canada. With 100% renewable energy, we build, own and operate our data centers and take pride in being at the forefront of sustainable solutions for the ever-evolving applications of high-performance compute. We believe that human progress is invaluable, but it should be done in the right way - responsibly, sustainably and having a positive impact on the communities we operate in. We are seeking a highly capable Incident Commander to operate at the center of critical operations supporting our HPC Data Center Operations. This role is responsible for leading the coordinated response to high-severity incidents, major outages, and critical service degradation events across HPC infrastructure and customer-facing production systems. The individual will serve as the operational command authority during major incidents, driving rapid detection, coordinated technical response, executive communication, service restoration, and post-incident operational improvement. The successful candidate must demonstrate operational leadership under pressure, the ability to coordinate cross-functional engineering organizations without direct authority, and the discipline to drive structured incident response during high-impact operational events.

Requirements

  • Bachelor's degree in Computer Science, Data Science, Statistics, or equivalent hands-on experience
  • 5+ years of experience in Incident Management, Major Incident Management, Reliability Operations, Production Operations, or related environments.
  • Proven experience leading high-severity incident response in enterprise-scale, high-availability environments.
  • Strong understanding of ITIL Incident Management practices and modern operational governance concepts.
  • Familiarity with Service Configuration Management / CMDB concepts and service dependency mapping.
  • Experience supporting operational automation and orchestration initiatives.
  • Excellent verbal and written communication skills with the ability to engage technical teams and executive leadership.
  • Experience Jira Service Management or similar platforms.

Responsibilities

  • Serve as the Incident Commander and operational lead during P1/P2 incidents, major outages, and critical service degradation events.
  • Lead structured incident bridges and coordinate cross-functional response models.
  • Lead and coordinate active incident bridges during operational events.
  • Monitor and manage major incident queues, escalations, and operational priorities.
  • Drive rapid Mean Time to Know (MTTK), service impact assessment, and restoration prioritization.
  • Coordinate recovery validation activities and ensure stable restoration before incident closure.
  • Own executive-level communication during high-severity incidents.
  • Deliver concise, business-focused updates to leadership and stakeholders.
  • Partner closely with Problem Management, Change Enablement, SRE, Infrastructure Operations, and Production Engineering teams.
  • Promote blameless post-incident review culture focused on operational learning and continual improvement.
  • Leverage observability and telemetry platforms to improve incident detection, triage, and response effectiveness.
  • Identify opportunities to automate incident detection, routing, communication, and escalation workflows.
  • Develop and maintain operational playbooks, communication templates, and response workflows.

Benefits

  • 100% company paid health insurance premiums (medical, dental, and vision) for employees, 75% company paid coverage for dependents
  • Company-paid short-term and long-term disability insurance
  • Voluntary life, critical illness, and accident coverage available
  • Health Savings Accounts (HSA) – when combined with the High-Deductible Health Plan
  • Employee Assistance Program and wellness resources
  • 401(k) retirement plan with company match
  • Paid professional development and access to financial planning and legal services
  • Paid Time Off (PTO) and paid holidays
  • Professional development to support certifications, continuing education, or role related training
  • Company events and team-building activities
  • Competitive wages with robust per diem and project allowance, when applicable
  • Overtime compensation for non-exempt workers for hours worked over 40 per week
  • Relocation and Living-out-allowance (as applicable and based on successful candidate circumstances)
© 2026 Teal Labs, Inc
Privacy PolicyTerms of Service