Manager, Platform Reliability

Adobe•Lehi, UT

About The Position

We are seeking a hands-on Platform Reliability Manager to lead the reliability, operations, and continuous improvement of several business - critical Platforms used across Marketing, HR, and Business Operations. This role sits within Adobe Technology Services and is responsible for ensuring these Platforms are dependable, observable, and well - operated at scale. You will lead a small, globally distributed team setting operational standards, guiding incident response, and partnering closely with both internal Partners and external SaaS providers. The ideal candidate brings a strong foundation in operating reliable platforms, along with proficiency to communicate clearly above the technical layer and hold internal and external Vendors accountable to the consistency and service levels Adobe expects.

Requirements

Experience managing and developing a global, distributed reliability team
Strong understanding of observability, incident management, and operational standard methodologies
Experience crafting or enforcing change and deployment processes that balance speed with stability
Demonstrated ability to manage vendor relationships, including setting expectations, reviewing performance, and driving accountability during incidents or service degradation
Familiar with employing AI effectively through context curation and documentation to achieve high velocity and quality in execution
Bachelor's degree in engineering or information systems
10+ years of experience in a similar

Responsibilities

Own the reliability and operational health of enterprise operational efficiency platforms, with a mix of end - to - end ownership and shared operational responsibility
Lead and develop a geographically dispersed team across North America and Europe, including managing an on - call rotation
Establish and evolve a standard operational model for change management and incident response across platforms
Drive operational rigor through strong observability practices, including metrics, alerting, and insight into platform health
Lead response to major incidents, ensuring clear communication, effective coordination, root cause identification, and durable remediation
Act as the primary operational point of contact for SaaS platform vendors, holding providers accountable for reliability, incident response, and service commitments
Communicate platform health, risks, and tradeoffs in business - relevant terms to functional partners and leadership
Detail operational standards as context for AI (and leverage AI) to improve reliability practices

Stand Out From the Crowd

Upload your resume and get instant feedback on how well it matches this job.

Upload and Match Resume