About The Position

The Site Reliability Product Owner oversees end-to-end release operations for a multi-application software portfolio supporting multiple missions and effectivities, coordinating releases across applications and managing bug/fix communications, customer and multi-level leadership approvals, incident response, and post‑incident reporting. The role serves as the principal technical and programmatic face to the customer and Boeing, requires hands‑on expertise with AWS infrastructure and Python automation, a working understanding of signal‑processing algorithms sufficient to interpret anomalous behavior and advise corrective actions, and responsibility for on‑call scheduling with an expectation of being available to respond at least 80% of the time while assigned. The Site Reliability Product Owner is responsible for monitoring and observations of all environments and for implementing and maintaining comprehensive monitoring strategies across all environments, including real-time observations of system health, anomaly detection, and alerting to pre‑emptive issues such as resource exhaustion and performance degradation; this includes environment monitoring dashboards and application monitoring and use of APM monitoring tools for performance and to track application and infrastructure performance with thresholds for proactive performance. The role also ensures proper testing of release candidates and coordinating release packages—validating release candidates through operational and enterprise testing, compiling release packages, and facilitating development activities into operational environments—and maintains release control processes (scheduling, versioning, change control), tracks and verifies fixes, leads diagnostics and mitigation during outages, prepares and presents executive incident slide decks, coordinates cross‑functional teams, implements and improves release processes and KPIs (deployment frequency, lead time, change success rate, MTTR), and supports proposals and future work by staying current on emerging technologies and regulatory changes.

Requirements

  • hands‑on expertise with AWS infrastructure and Python automation
  • working understanding of signal‑processing algorithms sufficient to interpret anomalous behavior and advise corrective actions
  • active U.S. Top Secret/SCI Security Clearance (U.S. Citizenship Required). (A U.S. Security Clearance that has been active in the past 24 months is considered active)

Responsibilities

  • oversees end-to-end release operations for a multi-application software portfolio
  • coordinating releases across applications and managing bug/fix communications
  • customer and multi-level leadership approvals
  • incident response, and post‑incident reporting
  • principal technical and programmatic face to the customer and Boeing
  • responsibility for on‑call scheduling with an expectation of being available to respond at least 80% of the time while assigned
  • monitoring and observations of all environments
  • implementing and maintaining comprehensive monitoring strategies across all environments, including real-time observations of system health, anomaly detection, and alerting to pre‑emptive issues such as resource exhaustion and performance degradation
  • ensures proper testing of release candidates and coordinating release packages
  • maintains release control processes (scheduling, versioning, change control)
  • tracks and verifies fixes
  • leads diagnostics and mitigation during outages
  • prepares and presents executive incident slide decks
  • coordinates cross‑functional teams
  • implements and improves release processes and KPIs (deployment frequency, lead time, change success rate, MTTR)
  • supports proposals and future work by staying current on emerging technologies and regulatory changes
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service