About The Position

As a Senior Product Delivery Associate in Cobrand & Partner Product Incident Management, you are trusted with resolving negative customer experiences through monitoring, working with cross-functional teams, building key relationships, and enabling the product to continuously deliver value to our stakeholders. The role entails owning end-to-end incident management—triage, coordination, communications, and post-incident review—to drive restoration of systems/services while ensuring controls, compliance, and operational rigor. You will also drive AI-enabled incident operations by using automation and analytics to detect patterns, reduce repeat incidents, improve change readiness, and continuously strengthen stability and scalability.

Requirements

  • 3+ years of experience in incident management/IT operations/SRE support within an enterprise environment.
  • Proven ability to lead triage calls across application, infrastructure, and vendor teams, driving rapid decisions and timely restoration.
  • Strong executive-ready communication skills: concise status updates, impact articulation, ETA management, and stakeholder alignment under pressure.
  • Demonstrated experience building and maintaining incident timelines, action trackers, and post-incident artifacts (RCA, corrective actions, lessons learned) with disciplined follow-through.
  • Track record of identifying themes and trends (recurring failure modes, top drivers, change-related incidents) and converting insights into prioritized remediation with measurable outcomes.
  • Working knowledge of ITIL/incident, problem, and change management concepts, including risk/controls mindset and audit-ready documentation.
  • AI focused mindset for continuous improvements in various areas (e.g., alert correlation, auto-ticket enrichment/summarization, anomaly detection, routing/noise reduction) in partnership with engineering.
  • Hands-on familiarity with monitoring/observability and ticketing tools (e.g., ServiceNow/Jira; Splunk/Datadog/Dynatrace/Grafana/ or similar).
  • Strong organizational skills to manage multiple concurrent incidents, dependencies, and stakeholders while meeting SLAs/OLAs.

Nice To Haves

  • Experience balancing multiple priorities while providing clear communication to stakeholders.
  • Working familiarity with AI/automation concepts in incident operations (e.g., noise reduction, correlation, summarization, auto-triage) and the ability to help teams adopt them.
  • Comfort coordinating cross-functional teams (app, infra, security, vendor, business) to restore service quickly and predictably.
  • Excellent executive communication skills—clear, concise, and consistent updates during ambiguity and pressure.
  • Tooling experience with ServiceNow, Splunk, Alteryx, LLM.
  • SQL experience would be outstanding.
  • Proficiency with using Excel and PowerPoint.

Responsibilities

  • Lead triage, coordination, communications, and post-incident review for incident tickets.
  • Run incident communications as a product: deliver crisp, audience-specific updates (technical + executive) with impact, scope, ETA/next update time, and decision/risk logs throughout the lifecycle.
  • Lead cross-functional triage bridges: establish command structure, assign owners, drive time-boxed troubleshooting, remove blockers, and keep teams aligned to restore service within agreed timelines.
  • Own end-to-end incident tracking: maintain a single source of truth for timelines, actions, owners, dependencies, and status; ensure closure criteria are met and artifacts are complete.
  • Translate incident data into insights: identify themes and trends across incidents (top drivers, recurring components, change-related failures, control gaps) and present actionable narratives to leaders.
  • Convert insights into measurable actions: create prevention playbooks and prioritized remediation backlogs; define success metrics (e.g., reduced repeats/MTTR) and track actions to completion.
  • Drive AI-enabled automation: using AI to drive alert correlation/noise reduction, anomaly detection, auto-triage routing, and incident summarization to accelerate diagnosis and reduce manual toil.
  • Operationalize proactive reliability: use trend analytics to recommend targeted automation, monitoring improvements, and resiliency work before issues become incidents.
  • Standardize and continuously improve process: refine severity frameworks, comms templates, escalation paths, and runbooks based on lessons learned and performance outcomes.

Benefits

  • comprehensive health care coverage
  • on-site health and wellness centers
  • a retirement savings plan
  • backup childcare
  • tuition reimbursement
  • mental health support
  • financial coaching
© 2026 Teal Labs, Inc
Privacy PolicyTerms of Service