Director, SRE

Cast & Crew

About The Position

At Cast & Crew, we’ve empowered creativity and supported the global entertainment industry for decades. Together with our family of brands - Backstage, CAPS, Checks & Balances, Final Draft, Media Services, Sargent-Disc, and The TEAM Companies – we operate as a combined entertainment technology and services provider offering industry standard screenwriting accounting software, digital payroll products, data & reporting, and a host of creative tools. The industry continues to move faster than ever, and the need for our expertise, our technology, and our people has never been greater. We are a production’s best ally every step of the way. #OneCastOneCrew The Director of SRE is a senior leadership role responsible for the reliability, scalability, and operational excellence of a large, multi-discipline engineering organization. You will own the platform and DevOps engineering functions — building and leading the teams, tools, and practices that allow software engineering, data engineering, and QA teams to ship confidently and run sustainably. You will report directly to the VP of Engineering and partner closely with software and data engineering leadership to define and deliver on reliability and platform strategy. This is both a hands-on leadership role and a strategic one: you will be as comfortable driving organizational design conversations as you are reviewing incident post-mortems or evaluating a new observability toolchain.

Requirements

8+ years in SRE, DevOps, or platform engineering, with at least 3 years in a senior leadership role managing managers or senior ICs.
Demonstrated experience leading platform or DevOps engineering teams in a large, multi-team engineering organization (100+ engineers).
Deep hands-on background in CI/CD, container-based infrastructure, cloud platforms (Azure preferred), and observability tooling.
Experience defining and scaling on-call programs, incident management processes, and reliability practices.
Strong communication skills — able to translate technical complexity for senior stakeholders and drive alignment across engineering leadership.
Track record of building high-performing teams through hiring, mentoring, and clear goal-setting.

Nice To Haves

Familiarity with Azure DevOps pipelines and EKS-based deployment patterns.
Experience with Team Topologies principles or similar frameworks for team structure design.
Background in highly regulated or enterprise environments.
Exposure to feature flag management (e.g., Unleash) and progressive delivery strategies.

Responsibilities

Platform & DevOps Engineering Own the platform engineering roadmap — CI/CD pipelines, container orchestration (EKS/Kubernetes), secrets management, and infrastructure-as-code standards across the org.
Drive standardization and adoption of DevOps best practices, including YAML pipeline conventions, Dockerfile standards, and deployment patterns.
Partner with software and data engineering teams to reduce toil, improve deployment frequency, and reduce time-to-restore.
Oversee the engineering organization's observability strategy, including tooling (New Relic) and alerting integration (PagerDuty, Microsoft Teams).
People Leadership & Org Building Lead, mentor, and grow a team of SRE and DevOps engineers — establishing clear career paths, setting high standards, and fostering a culture of ownership and psychological safety.
Define team topology for the SRE and platform functions, including team scope, interfaces with stream-aligned teams, and on-call responsibilities.
Build hiring plans and execute on them — owning the full cycle from role definition through onboarding.
Serve as an organizational model for blameless culture, continuous improvement, and cross-functional collaboration.
Incident Management & Reliability Own the incident management process end-to-end: severity classification, on-call rotation design, escalation paths, post-incident reviews, and tooling integration.
Drive down MTTR and MTBF through systematic root cause analysis, reliability investments, and proactive capacity planning.
Champion SLO/SLI/error budget practices across engineering teams.
Strategy & Governance Contribute to engineering-wide standards and documentation — including coding standards, CI/CD expectations, and operational runbooks in Confluence and Azure DevOps.
Act as a key voice in architectural decisions with reliability, scalability, or operational implications.
Stay current on industry trends and lead evaluation of emerging tools and practices.