About The Position

The Incident Manager within the Incident Management Hub (IMH) is responsible for leading the end-to-end management of high-impact technology incidents to minimize business disruption, restore services quickly, and protect the organization’s reputation. This role serves as the central point of coordination during major incidents, ensuring effective engagement across technology teams, business stakeholders, executive leadership, and support partners. This position requires a strong business mindset, operational discipline, and the ability to lead through high-pressure situations. The Incident Manager is expected to drive structured incident response, provide timely and audience-appropriate communications, align recovery efforts with business priorities, and support ongoing service improvement initiatives. As part of the IMH team, this role also contributes to a 24x7 operational support model through assigned shifts, weekend coverage, and on-call rotations. The Incident Manager plays a critical role not only in restoring service during outages, but also in identifying trends, improving processes, strengthening resiliency, and enhancing executive visibility through reporting, automation, and continuous improvement efforts.

Requirements

  • Bachelor’s degree in Information Technology, Business Administration or similar.
  • 10+ years of experience working in Incident Management or similar role.
  • Strong experience in Incident Management, IT Operations, or IT Service Management, including direct leadership of major or enterprise-impacting incidents.
  • Bilingual English/Spanish communication skills are required, particularly in multinational or highly collaborative environments.
  • Hands‑on experience using Dynatrace for real‑time monitoring, automated alerting, and building dashboards that provide actionable insights into service health and incident trends.
  • Proven ability to implement and interpret synthetic monitoring tests to proactively detect service degradation and prevent customer‑impacting incidents.
  • Strong background with enterprise observability platforms (e.g., Dynatrace, Splunk) to analyze telemetry data, identify root causes, and support rapid incident resolution.
  • Working knowledge of Linux and Windows server environments to effectively assess system‑level issues, collaborate with infrastructure teams, and understand platform‑specific incident impacts.
  • Demonstrated ability to analyze operational data and incident trends, using KPIs and KRIs to identify systemic risks and improve service stability
  • Strong understanding of ITIL framework, incident management principles, and service management best practices.
  • Proven ability to communicate effectively with executive leadership and translate technical issues into meaningful business impact.
  • Experience working in complex, enterprise-level technology environments with multiple integrated systems and stakeholders.
  • Strong analytical, critical thinking, and risk assessment skills with a business-first mindset.
  • Demonstrated ability to make sound decisions under pressure and manage multiple priorities in a fast-paced environment.
  • Excellent stakeholder management, leadership, facilitation, and coordination skills.
  • Experience with incident reporting, trend analysis, service improvement, and operational metrics.

Nice To Haves

  • Familiarity with automation tools, reporting platforms, and AI-driven analytics capabilities is preferred.
  • Strong executive presence and communication skills
  • Ability to lead high-pressure situations with confidence and professionalism
  • Strong collaboration across technical and non-technical teams
  • Excellent organizational skills and attention to detail
  • Customer-focused, business-oriented approach to service restoration
  • Commitment to continuous improvement, resiliency, and operational excellence
  • Established work history or equivalent demonstrated through a combination of work experience, training, military service, or education.
  • Experience in Microsoft Office products.

Responsibilities

  • Lead and coordinate the resolution of major, critical, and high-priority technology incidents to ensure rapid restoration of services in alignment with business impact, operational priorities, and service level expectations.
  • Serve as the central point of contact during high-severity incidents, acting as the primary liaison between technology teams, business stakeholders, executive leadership, and support partners.
  • Facilitate incident bridge calls and command-center activities, establish action plans, assign accountability, maintain momentum, and drive incidents through to resolution.
  • Provide clear, concise, and business-focused communications throughout the incident lifecycle, including stakeholder notifications, executive updates, status reports, and post-incident summaries.
  • Assess business impact and prioritize response and recovery efforts based on operational, customer, regulatory, reputational, and financial risk.
  • Ensure appropriate and timely escalation of incidents to senior leadership, technical teams, vendors, and business partners as needed.
  • Coordinate cross-functional teams during incident response, including infrastructure, application support, networking, cybersecurity, service desk, change management, and problem management teams.
  • Monitor incident progression, track key decisions and actions, and ensure adherence to incident management processes and governance standards.
  • Oversee and support root cause analysis activities, ensuring corrective and preventive actions are documented, tracked, and implemented to reduce repeat incidents.
  • Identify patterns, recurring issues, systemic risks, and process gaps through incident trend analysis, and recommend strategic improvements to improve service stability and operational resilience.
  • Partner with Change Management, Problem Management, Engineering, Infrastructure, and Business Continuity teams to align incident response with long-term prevention and resiliency objectives.
  • Track and report on key performance indicators such as mean time to restore, incident trends, SLA performance, business impact, executive escalations, and service stability metrics.
  • Support crisis management activities and participate in business continuity and disaster recovery testing, planning, and response exercises.
  • Contribute to process maturity by helping standardize procedures, improve incident governance, and promote best practices across the organization.
  • Mentor junior team members and support knowledge sharing, coaching, and team development within the IMH function.
  • Drive the adoption of automation, dashboards, and AI-enabled reporting solutions to improve incident analytics, predictive insights, executive reporting, and operational efficiency.
  • Perform additional duties and leadership responsibilities as assigned in support of business, technology, and operational objectives.

Benefits

  • Fair and competitive rewards package
  • Benefits designed to support you, your family and your well-being, now and into the future.
© 2026 Teal Labs, Inc
Privacy PolicyTerms of Service