Director, Engineering Operations

TKO•Austin, TX

3d•Hybrid

About The Position

On Location is a global leader in premium experiential hospitality, offering ticketing, curated guest experiences, live event production and travel management across sports, entertainment, fashion and culture. On Location provides unrivaled access for corporate clients and fans looking for official, immersive experiences at marquee events, including the Olympic and Paralympic Games, FIFA World Cup 2026, Super Bowl, NCAA Final Four, and more. An official partner and/or service provider to over 150 iconic rights holders, such as the IOC (the Milano Cortina 2026 and Los Angeles 2028 Olympic Games), FIFA, NFL, NCAA, UFC, WWE, and PGA of America, the company also owns and operates a number of its own unique experiences. On Location is a subsidiary of TKO Group Holdings, Inc. (NYSE: TKO), a premium sports and entertainment company. TKO Group Holdings, Inc. (NYSE: TKO) is a premium sports and entertainment company. TKO owns iconic properties including UFC, the world’s premier mixed martial arts organization; WWE, the global leader in sports entertainment; and PBR, the world’s premier bull riding organization. Together, these properties reach 1 billion households across 210 countries and territories and organize more than 500 live events year-round, attracting more than three million fans. TKO also services and partners with major sports rights holders through IMG, an industry-leading global sports marketing agency; and On Location, a global leader in premium experiential hospitality.

Requirements

10+ years in software engineering operations, site reliability engineering, platform or DevOps leadership supporting 24x7 systems
Experience leading and improving team performance measuring against DORA metrics
Proven track record leading incident response and postmortems, with measurable reductions in MTTD, MTTI, and MTTR and decreases in MTBF
Hands-on experience implementing observability and SLO/SLI frameworks
Strong background with CI/CD, trunk-based development, automated testing strategies, and release orchestration
Security-by-design mindset, experience with IRP/SIRP operations and DevSecOps practices
Excellent stakeholder management; effective and concise communication skills with both technical and non-technical audiences
Ability to lead and execute through ambiguity and high-demand, high-stakes events

Nice To Haves

Experience with Datadog, feature flagging platforms, and progressive delivery
History of operating platforms supporting large-scale events, ticketing, payments, or high-traffic commerce
Experience applying AI in technical operations and/or software delivery
Background in capacity planning, performance engineering, and chaos testing
Familiarity with regulated environments and audit processes; comfortable publishing operational evidence and controls
Experience shaping org-level SDLC, STLC, and SSDLC standards across internal and partner teams

Responsibilities

Own end-to-end engineering operations across RTB: intake, triage, prioritization, change/release governance, incident response, and post-mortems
Drive AI-enabled operational efficiency and automation across the SDLC, STLC, and SSDLC
Establish comprehensive observability with golden signals, SLIs/SLOs, anomaly detection, auto-remediation, and cost/capacity insights
Define and uphold SLOs for critical domains and guest journeys (checkout, inventory sync, fulfillment, payments)
Standardize Datadog logs, metrics, traces, and RUM/synthetics to accelerate detection and root-cause analysis
Continuously measure and improve delivery performance through DORA metrics
Enforce release discipline: balanced planned vs. unplanned releases, readiness criteria, rollback playbooks, and event blackout windows
Support major events with elevated operational rigor: dry runs, performance testing, strict change controls, enhanced monitoring, and clear comms protocols
Partner with Business Operations, Technical Product, and Solutions Architecture to maintain a single, aligned view of priorities, dependencies, and SLAs
Lead post-event and incident post-mortems to drive continuous improvement of SOPs, runbooks, response protocols, and reliability
Mature incident and security response in close partnership with TechOps and Security & Compliance (IRP/SIRP)
Continuously reduce technical debt across performance, security, and maintainability
Foster learning, blameless culture with KPI/OKR-driven improvements and transparent communication
Publish clear weekly and monthly operational health and stability reporting