Sr Principal Site Reliability Engineer

The Walt Disney Company•San Francisco, CA

67d

About The Position

P5/P6: SRE Lead, Content Distribution Engineering Media Engineering. SF CA / LA CA / NYC Team Intro On any given day at Disney Entertainment & ESPN Technology, we’re reimagining ways to create magical viewing experiences for the world’s most beloved stories while also transforming Disney’s media business for the future. Whether that’s evolving our streaming and digital products in new and immersive ways, powering worldwide advertising and distribution to maximize flexibility and efficiency, or delivering Disney’s unmatched entertainment and sports content, every day is a moment to make a difference to partners and to hundreds of millions of people around the world. A few reasons why we think you’d love working for Disney Entertainment & ESPN Technology Building the future of Disney’s media business: DE&E Technologists are designing and building the infrastructure that will power Disney’s media, advertising, and distribution businesses for years to come. Reach & Scale: The products and platforms this group builds and operates delight millions of consumers every minute of every day – from Disney+ and Hulu, to ABC News and Entertainment, to ESPN and ESPN+, and much more. Innovation: We develop and execute groundbreaking products and techniques that shape industry norms and enhance how audiences experience sports, entertainment & news. Media Engineering is an innovative organization that is focused on providing the best possible video playback experience, art, and metadata to customers around the world, powered by exceptional technology. This strategic work requires streamlining and repurposing technology across different business and distribution channels – including streaming, linear, and theatrical – so that technology can ebb and flow across the needs of the business. Job Description This role will report to the Senior Vice President of Media Engineering, with a scope spanning several departments and hundreds of developers and operators. You will drive high-availability with an ultimate goal of 99.999% incident-free uptime across the entire platform. You will ensure fast response, proactive prevention, effective monitoring, and sound architectural design. Accountable for platform stability and uptime, from processing platform and content supply chain through CDN delivery and playback. Develop solid understanding of all critical data flows and ensure proper instrumentation and alerting practices. Drive redundancy and resiliency strategy across thousands of servers, network links, in both datacenter and cloud environments. Responsible for Media Engineering’s Incident Response process and ensuring follow-ups and proactive actions to avoid service incidents. Partner with Infrastructure, Operations, Product, and Development teams to ensure best-practices, conducting audits and reviews across each domain. Drive automation strategy for more rapid safe releases, tighter content SLAs, and a more efficient organization.

Requirements

Minimum of 12 years of engineering leadership experience, including managing and influencing teams directly and indirectly
Bachelors or higher degree in Engineering or a related field, or equiv experience.
Experience working across complex globally connected teams with a variety of stakeholders
Experience with large-scale globally distributed platforms including content preparation, distribution, playback, operations, and infrastructure
Possess a vision for exceptional escalation management and engineering excellence.
Ability to develop and implement and socialize strategies and tactics to drive improvement in stability, system performance, team capability, and operational efficiency.
Knowledge of how to use data to understand and improve business performance.
Passion for developing teams, with a focus on continuous learning.
Track record of developing strong cross-functional and cross-regional relationships.

Nice To Haves

Direct experience with major Content Delivery Network integrations
Experience in media streaming technologies, especially media processing workflows and tooling, media players and devices, and content delivery strategies.
Experience with both high-scale back-end services (cloud and datacenter) along with client development on a variety of devices (mobile, web, living room devices).
Experience with Media Operations and/or Infrastructure management

Responsibilities

Drive high-availability with an ultimate goal of 99.999% incident-free uptime across the entire platform.
Ensure fast response, proactive prevention, effective monitoring, and sound architectural design.
Accountable for platform stability and uptime, from processing platform and content supply chain through CDN delivery and playback.
Develop solid understanding of all critical data flows and ensure proper instrumentation and alerting practices.
Drive redundancy and resiliency strategy across thousands of servers, network links, in both datacenter and cloud environments.
Responsible for Media Engineering’s Incident Response process and ensuring follow-ups and proactive actions to avoid service incidents.
Partner with Infrastructure, Operations, Product, and Development teams to ensure best-practices, conducting audits and reviews across each domain.
Drive automation strategy for more rapid safe releases, tighter content SLAs, and a more efficient organization.

Benefits

A bonus and/or long-term incentive units may be provided as part of the compensation package, in addition to the full range of medical, financial, and/or other benefits, dependent on the level and position offered.

Stand Out From the Crowd

Upload your resume and get instant feedback on how well it matches this job.

Upload and Match Resume