Manager, Site Reliability Engineering, Infrastructure Engineering

Amazon•Culver City, CA

157d•$149,700 - $258,800

About The Position

Prime Video’s Studios Technology Services team is searching for a Manager, Site Reliability Engineering. The Studios Technology Services team supports our Media Supply Chain, Production Technology workflows, and Studios business teams (finance, marketing, asset management, productions). We work in a team environment and interact with engineering, productions and studio executives at all levels. Our Infrastructure Engineering team is looking for a Sr Manager of Site Reliability Engineering to lead and manage critical infrastructure, applications, and a high performing team. The team will operationalize the delivery and reliability of these systems and discover innovative ways to scale and operate them reliably as we expand. You will manage the systems to ensure that critical infrastructure is operating optimally and implement mechanisms to prevent service impacting incidents. You will utilize trends and metrics to identify opportunities for improvements within existing frameworks, tools and processes to continuously improve systems. Our SREs focus on automating infrastructure and enabling our partners and platforms at scale with engineering best practices in mind.

Requirements

Minimum of 10 years of hands-on systems reliability engineering and providing senior level technical direction on enterprise level projects.
Exemplary leadership and interpersonal relationship abilities.
Experience setting strategic vision, owning and resolving issues that impact design, product success, or address future concepts, products, or technologies.
Disciplined approach to maintaining and enforcing engineering best practices.
Ability to collaborate with cross-functional teams, and cross-functional business units.
Detail-oriented, self-organized, and capable of simultaneously tracking multiple issues of varying complexity.

Nice To Haves

Excellent management and communication skills.
Experience in algorithms, data structures, complexity analysis and software design with Unix/Linux systems, AWS, performance and application issues.

Responsibilities

Building, leading, and managing a high-performing team of site reliability engineers and leads.
Designing, documenting, planning and operating, as well as resolving critical issues.
Supporting a variety of complex media systems, infrastructure, and applications.
Overseeing technical analysis, cost and effort estimation, production environment platform/system design, architectural fit and compliance, resource schedules, delivery milestones, collaboration, and overall production environment quality.
Developing goals and strategy for the team.
Building relationships and influencing internal customers and partner teams.
Managing on-call rotations across continents.
Earning trust of peers and stakeholders through body of work and day to day interactions.
Champions SRE best practices, Infrastructure as Code (IaC); provides thought leadership; establishes enterprise-level infrastructure patterns.
Providing coaching and mentorship to team members.
Collaborating with other teams and engineers to find innovative solutions for moderately complex problems.
Developing technical documentation to support projects, initiatives, SoWs and proposals.

Benefits

Medical, financial, and/or other benefits.
Equity, sign-on payments, and other forms of compensation may be provided as part of a total compensation package.

Stand Out From the Crowd

Upload your resume and get instant feedback on how well it matches this job.

Upload and Match Resume

What This Job Offers

Job Type

Full-time

Career Level

Manager

Manager, Site Reliability Engineering, Infrastructure Engineering

About The Position

Requirements

Nice To Haves

Responsibilities

Benefits

What This Job Offers

Job Search Resources

Tools

Career Hubs

Guides

Company