Business Analyst IV - Alert Management & Observability Standards Lead

AstreyaRemote, CA, CA
$98,040 - $154,800Onsite

About The Position

The Business Analyst IV will provide solutions that help attain business outcomes. The Alert Management & Observability Standards Lead is responsible for rationalizing and governing all system alerts to ensure they align with department priorities, operational coverage models, and service reliability goals. This role defines alerting standards, reviews and approves alerts before they are routed to the 24x7 Eyes-on-Glass Operations team, and establishes a scalable approach to cataloging alert response instructions (runbooks/playbooks) so responders can take consistent, high-quality actions. This position operates at the intersection of the IT Operations Command Center (OCC), engineering/application teams, platform/monitoring tool owners, and service owners, ensuring alerts are actionable, prioritized, and paired with clear response guidance.

Requirements

  • 5+ years in IT Operations, SRE, Observability, Monitoring Engineering, or Incident Management
  • Demonstrated success reducing noise and improving actionability across enterprise alerting ecosystems
  • Experience with common monitoring/observability tools (e.g., Splunk, AppDynamics, Dynatrace, Datadog, Prometheus/Grafana, Azure Monitor, CloudWatch, ServiceNow Event Mgmt or similar)
  • Strong understanding of incident response workflows and operational coverage models (24x7 vs. business hours)
  • Strong understanding of CMDB/service ownership concepts and dependency mapping
  • Strong understanding of standard operating procedures/runbooks and knowledge management
  • Excellent stakeholder management and ability to drive standards across teams

Nice To Haves

  • Experience designing or operating an Operations Command Center / NOC / SOC-style “eyes-on-glass” model
  • Familiarity with ITIL Event Management, SRE principles, and service reliability practices
  • Experience with automation for alert enrichment, correlation, and routing (e.g., event correlation, deduplication, noise suppression)
  • Background in governance frameworks and operating rhythm design (cadences, controls, compliance traceability)

Responsibilities

  • Establish and maintain a department-wide alert rationalization framework that evaluates alerts for business/service criticality, operational priority, actionability, signal-to-noise ratio, ownership, and escalation paths.
  • Perform regular alert reviews to ensure alert quality, correct routing, and alignment with operational coverage.
  • Lead continuous improvement efforts to reduce alert fatigue while preserving detection of true incidents and high-impact degradation.
  • Define and enforce alerting standards including severity definitions, thresholds, required metadata, naming conventions, tagging taxonomy, and routing rules.
  • Create a standardized Alert Design Checklist and approval workflow.
  • Partner with tool/platform owners to ensure standards are embedded in monitoring tooling.
  • Act as gatekeeper for determining which alerts should go to 24x7 Eyes-on-Glass for immediate triage, route to on-call engineering, create tickets for business-hours handling, or be suppressed, aggregated, or converted to dashboards/health indicators.
  • Ensure routing aligns with operational responsibilities and skills of the Eyes-on-Glass team, department priorities, and service ownership and support models.
  • Establish a consistent approach to cataloging response instructions for every actionable alert, including alert meaning, triage steps, remediation actions, and escalation procedures.
  • Own the runbook template and ensure runbooks are versioned, maintained, and reviewed on a defined cadence.
  • Partner with service owners to ensure runbooks stay current as systems change.
  • Define and publish KPIs that demonstrate alerting health and operational performance.
  • Facilitate governance forums with service owners and engineering leads to review alert quality and backlog.
  • Coach service teams on best practices such as SLIs/SLOs, alert thresholds, dependency monitoring, and incident correlation.
  • Drive adoption of observability patterns.
  • Support major incident learning by feeding post-incident insights back into improved alerts and runbooks.
  • Deliver alerting standards, intake and approval workflow, rationalization of noisy services, runbook template launch, and central alert catalog within the first 45 days.

Benefits

  • Employment in the fast-growing IT space providing you with a variety of career options
  • Opportunity to work with some of the biggest firms in the world as part of the Astreya delivery network
  • Introduction to new ways of working and awesome technologies
  • Career paths to help you establish where you want to go
  • Focus on internal promotion and internal mobility
  • Free 24/7 accessible Professional Development through LinkedIn Learning and other online courses
  • Education Assistance
  • Dedicated management to provide you with on point leadership and care
  • Numerous on the job perks
  • Market competitive compensation and insurance, health and wellness benefits
  • Medical provided through UHC (PPO, HSA, Surest options) / Medical provided through Kaiser (HMO option only) for California employees only
  • Dental provided through UHC Nationwide
  • Vision provided by UHC
  • Flexible Spending Account for Health & Dependent Care
  • Pre-Tax Account for Commuter Benefit/Parking & Transit (location-specific)
  • Continuing Education and Professional Development via various integrated platforms, e.g. Udemy and Coursera
  • Corporate Wellness Program provided by Goomi Group
  • Employee Assistance Program
  • Wellness Days
  • 401k Plan
  • Basic and Supplemental Life Insurance
  • Short Term & Long Term Disability
  • Critical Illness, Critical Hospital, and Voluntary Accident Insurance
  • Tuition Reimbursement (available 6 months after start date, capped)
  • Paid Time Off (accrued and prorated, maximum of 120 hours annually)
  • Paid Holidays
  • Any other statutory leaves, paid time, or other ancillary benefits required under state and federal law
© 2026 Teal Labs, Inc
Privacy PolicyTerms of Service