Data Center Incident Program Manager

Jobgether
6h$125,600 - $228,000

About The Position

The Data Center Incident Program Manager will lead the end-to-end incident management lifecycle for mission-critical data center environments, ensuring operational resilience and rapid recovery. This role requires a strategic and detail-oriented professional who can define standards, establish protocols, and lead cross-functional teams during high-impact incidents. You will serve as Incident Commander when necessary, drive post-incident analysis, and implement corrective actions to prevent recurrence. By designing governance frameworks, reporting structures, and readiness exercises, you will enhance reliability, accountability, and operational excellence. The ideal candidate thrives under pressure, brings technical credibility in data center operations, and fosters continuous improvement across teams and processes. Your work will directly impact the stability and scalability of high-density compute infrastructure.

Requirements

  • 7+ years of experience in mission-critical infrastructure, data center operations, or reliability engineering
  • Proven experience leading major incidents and war rooms, with calm and decisive leadership under pressure
  • Strong familiarity with facilities systems, hardware operations, or network infrastructure
  • Demonstrated ability to run post-incident reviews and track corrective actions effectively
  • Experience defining and operationalizing incident management processes, documentation, and escalation paths

Nice To Haves

  • experience in hyperscale or high-density AI compute environments
  • familiarity with ISO-based quality systems
  • proficiency with incident tooling such as PagerDuty, ServiceNow, or Jira

Responsibilities

  • Define incident severity levels, escalation thresholds, and lifecycle stages from declaration to closure
  • Establish and maintain incident response standards, war rooms, runbooks, and stakeholder communication templates
  • Lead readiness activities including simulations, tabletop exercises, and on-call Incident Commander rotations
  • Serve as Incident Commander during high-impact events, coordinating cross-functional teams and driving structured response
  • Conduct post-incident reviews, perform root cause analyses, and track corrective and preventive actions to closure
  • Implement incident management tools, dashboards, and program metrics to monitor performance and readiness
  • Communicate trends and systemic gaps to design and operations teams for ongoing improvement

Benefits

  • Competitive salary range of $125,600–$228,000 USD plus equity and performance-related bonuses
  • Medical, dental, and vision insurance with employer contributions to Health Savings Accounts
  • Pre-tax accounts for Health FSA, Dependent Care FSA, and commuter expenses
  • 401(k) plan with employer match
  • Paid parental, medical, and caregiver leave
  • Flexible paid time off and 13+ company holidays, plus additional coordinated office closures
  • Mental health and wellness support
  • Employer-paid life and disability coverage
  • Annual learning and development stipend and relocation support for eligible employees
  • Meal benefits and other taxable fringe perks such as charitable donation matching and wellness stipends
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service