Major Incident Manager

TEKsystemsSt. Louis, MO
Remote

About The Position

The Major Incident Manager is a high-impact leadership role responsible for the end-to-end management of critical IT service disruptions that affect end users and operations. This role goes beyond traditional incident coordination and requires command authority, strong executive communication, and deep understanding of IT operations. This role exists because the current incident model is not meeting enterprise needs and requires experienced leadership to reset and stabilize Tier 1 incident response across multiple facilities. This role is accountable for incident outcomes, not coordination alone. The Major Incident Manager acts as the "Incident Commander," (single point of command) driving the swift restoration of critical services while maintaining transparent communication with executive leadership and clinical stakeholders. This role is responsible for incident outcomes, not just processes.

Requirements

  • 8+ years of progressive IT operations or enterprise support experience.
  • 5-7+ years in IT major incident management or IT service management
  • Excellent crisis leadership and decision-making abilities under pressure.
  • Strong analytical mindset with the ability to interpret complex data and performance metrics to drive strategic decisions.
  • Exceptional communication and interpersonal skills, capable of effective collaboration with stakeholders at all organizational levels.
  • Strong problem-solving and organizational skills, with a focus on details and time management.

Nice To Haves

  • ITIL 4 Foundation or higher.
  • Experience with industry-standard monitoring and observability tools (e.g., ScienceLogic, SolarWinds).

Responsibilities

  • Lead the Major Incident Bridge by facilitating 24/7 technical bridge calls and "war rooms" to triage and resolve Priority 1 (P1) and Priority 2 (P2) incidents impacting multiple systems.
  • Lead real-time decision-making under pressure, balancing technical recovery with safety and clinical impact.
  • Quickly evaluate the scope of technical outages to determine their impact on patient safety and critical business applications.
  • Identify trends, anomalies, and recurring failure patterns
  • Issue regular, clear updates to leadership (CIO/CTO) and Service Portfolio leads using pre-defined communication templates and protocols.
  • Maintain confidence and calm in high-pressure situations with leaders and clinical partners.
  • Own and facilitate after action reports and Post-Incident Reviews (PIR) within 48 hours to identify root causes and drive preventive actions.
  • Conduct root cause discussions and ensure corrective actions are identified and tracked.
  • Own and continuously optimize the enterprise incident management process in alignment with ITIL best practices.
  • Coordinate with third-party vendors and internal cross-functional teams (Network, Security, Clinical Apps) to ensure rapid service recovery.
  • Utilize AIOps, triaging and monitoring tools, dashboards, and alerting systems across on-premise and cloud environments (e.g. SolarWinds, NetPath, ScienceLogic) to assist with MTTD and MTTR.
  • Serve as an escalation point for complex operational incidents, guiding technical teams to swift and effective resolution for critical monitoring issues.
  • Analyze monitoring data and performance metrics (MTTD, MTTA, MTTR, Incident Recurrence Rate, SLA Compliance) to identify trends, anomalies, and potential issues, providing recommendations for improvement and capacity planning.
  • Identify and implement automation opportunities for major incident management and routine tasks to reduce manual workload and improve efficiency.
  • Collaborate with cross-functional teams (Application, Network, Security, Cloud Enablement, Managed Service Provider, etc.) to maintain Major Incident Management comprehensive documentation, including standard operating procedures (SOPs) and runbooks.
  • Participate in root cause analysis (RCA) and post-incident reviews to prevent recurring issues and drive long-term solutions.

Benefits

  • Medical, dental & vision
  • Critical Illness, Accident, and Hospital
  • 401(k) Retirement Plan – Pre-tax and Roth post-tax contributions available
  • Life Insurance (Voluntary Life & AD&D for the employee and dependents)
  • Short and long-term disability
  • Health Spending Account (HSA)
  • Transportation benefits
  • Employee Assistance Program
  • Time Off/Leave (PTO, Vacation or Sick Leave)
© 2026 Teal Labs, Inc
Privacy PolicyTerms of Service