Executive Director, Platform Governance & Strategy

OCC•Chicago, IL

1d•$197,800 - $343,900•Hybrid

About The Position

This role provides executive-level leadership over Platform Engineering Governance & Compliance, Site Reliability Engineering, Strategy & Architecture, and Metrics & Reporting within the Platform Engineering organization. The leader in this role is accountable for scaling a mature SRE practice, driving cloud architectural standards and multi-year strategy, and ensuring the organization operates with clear, data-driven visibility into platform health and performance. A critical dimension of this role is ownership of the FinOps and SecOps domains as Product Manager, alongside governance of PE compliance obligations spanning incidents, risks, and audit findings. The ideal candidate brings deep technical credibility, executive communication skills, and a proven track record of operationalizing reliability, architectural governance, and compliance programs at scale in a regulated environment.

Requirements

Proven executive-level leadership of SRE, cloud engineering, or platform reliability organizations in a regulated industry environment.
Demonstrated ability to build and scale SRE practices including SLO/SLA frameworks, on-call models, error budgets, and incident response programs.
Deep expertise in cloud architecture strategy and governance, with experience defining and driving enterprise-wide architectural standards.
Strong track record of cross-functional partnership with Program/Product Management, translating platform capabilities into sequenced, delivery-ready roadmaps.
Demonstrated experience serving in a Product Manager capacity for technical domains such as FinOps, SecOps, or platform tooling, including ownership of roadmap, prioritization, and stakeholder alignment.
Experience establishing and managing governance and compliance frameworks within a platform or infrastructure engineering organization, including oversight of incidents, problem management, risk items, and audit obligations.
Ability to design and maintain metrics and reporting frameworks that provide meaningful visibility into platform health, engineering performance, and compliance posture.
Exceptional written and verbal communication skills; ability to translate technical complexity into executive-level insights and business decisions.
Demonstrated ability to lead high-performing, highly technical teams through accountability, coaching, and clear ownership models.
Experience managing work in Agile/Scrum environments with strong prioritization and deadline management discipline.
Deep knowledge of SRE tooling and observability platforms (e.g., Prometheus, Grafana, PagerDuty, Datadog, or equivalents).
Expert-level knowledge of cloud platforms: AWS, Azure, or GCP; experience with multi-cloud or hybrid environments preferred.
Strong working knowledge of cloud-native architecture patterns and Infrastructure as Code principles.
Familiarity with container orchestration and streaming platforms (Kubernetes, Kafka) and CI/CD tooling (GitHub Actions, Jenkins, or equivalents).
Experience with metrics and reporting platforms; ability to design KPI frameworks and reporting dashboards for both technical and executive audiences.
Working knowledge of FinOps principles and cloud cost governance, with experience driving cost transparency and optimization at an organizational level.
Familiarity with SecOps tooling and security governance practices within a cloud or platform engineering context.
Bachelor's degree, preferably in a technical discipline (Computer Science, Mathematics, Engineering, or related field), or equivalent combination of education and experience.
15+ years of progressive experience in cloud engineering, platform reliability, or infrastructure roles with at least 5 years in senior engineering leadership.
AWS Solutions Architect Associate Certification or higher strongly desired.

Nice To Haves

Experience operating in a production change control process and working directly with audit and compliance functions in a regulated environment.
Experience in financial services or similarly regulated industries with exposure to CIS, NIST, and related frameworks.
Experience with GRC tooling or platforms used to manage risk, audit findings, and compliance obligations (e.g., ServiceNow, Archer, or equivalents).
Depth and breadth of experience in a highly regulated industry such as financial services, with demonstrated understanding of applicable rules and regulatory frameworks.
Google Cloud Professional Cloud Architect, Microsoft Azure Solutions Architect, or equivalent certification.
Relevant SRE or reliability-focused certifications a plus.

Responsibilities

Site Reliability Engineering: Lead the scaling and maturation of the SRE practice, establishing error budgets, SLOs, SLAs, and incident response frameworks across all platform services. Define and enforce reliability standards including on-call models, blameless postmortem processes, and corrective action tracking to drive continuous improvement. Partner with Platform Foundation teams (Kubernetes, Kafka, FinOps/Security) to embed reliability principles into build and operate models. Champion toil reduction through automation, ensuring engineering capacity is redirected from manual operations to higher-value platform capabilities.
Platform Engineering Governance & Compliance: Serve as Product Manager for the FinOps and SecOps domains within Platform Engineering, owning the product vision, prioritization, and stakeholder alignment for governance tooling and practices. Establish and maintain a governance framework ensuring Platform Engineering adheres to organizational standards across incident and problem management, SORTs, risk tracking, and audit findings. Own the end-to-end process for PE compliance obligations, ensuring timely resolution and closure of incidents, problem tickets, risk items, and audit observations with clear accountability and tracking. Partner with Risk, Compliance, and Security functions to proactively identify governance gaps, drive remediation, and ensure PE operates within the organization's risk appetite. Maintain visibility and reporting on PE's compliance posture across all obligation types, surfacing trends, aging items, and residual risks to CARE leadership and relevant stakeholders.
Site Reliability Engineering COE: Lead the scaling and maturation of the SRE practice, establishing error budgets, SLOs, SLAs, and incident response frameworks across all platform services. Define and enforce reliability standards including on-call models, blameless postmortem processes, and corrective action tracking to drive continuous improvement. Partner with Platform Engineering Product teams (Kubernetes, Kafka, FinOps/Security) to embed reliability principles into build and operate models. Champion toil reduction through automation, ensuring engineering capacity is redirected from manual operations to higher-value platform capabilities.
Cloud Strategy & Architecture: Define and execute the multi-year cloud architecture strategy aligned to business growth, scalability, regulatory compliance, and cost optimization goals. Establish cloud architectural standards, reference architectures, and governance frameworks (landing zones, identity, network patterns, service catalog) and drive adoption across engineering. Guide cloud-native architecture decisions including containers/orchestration, IaaS/PaaS adoption, disaster recovery, and multi-region patterns with a steady eye on regulatory requirements (e.g., CIS, NIST). Oversee technology roadmaps and end-of-life planning for cloud platform components, ensuring forward-looking decisions balance innovation with operational stability. Serve as a key technical advisor to senior leadership, translating complex architectural trade-offs into clear business decisions.
Metrics & Reporting: Own the platform metrics and reporting function, establishing a consistent framework for measuring platform health, engineering velocity, reliability, and cost efficiency across CARE. Define and track KPIs aligned to internal SLAs, executive reporting needs, and audit/compliance requirements. Ensure Jira and other platform tooling serve as the single source of truth for work visibility, with dashboards and reporting that enable data-driven prioritization. Build and maintain reporting cadences for leadership, including platform health scorecards, capacity forecasting, and risk transparency.
PM Coordination & Platform Delivery: Serve as the primary engineering leadership partner to the Platform Engineering Program Management function, ensuring platform initiatives are properly scoped, sequenced, and resourced. Drive alignment between engineering capacity and roadmap commitments, proactively surfacing dependency risks and trade-off decisions to the CARE Executive Director. Coordinate across PE domains to ensure cross-team delivery dependencies are managed and resolved effectively. Partner with Product and Engineering leaders outside of CARE to align platform capabilities to broader organizational roadmaps.
Leadership & Organizational Excellence: Lead, develop, and retain a high-performing team of engineering managers and individual contributors with clear ownership, career paths, and accountability frameworks. Foster a culture consistent with CARE operating principles: automation-first, full-stack ownership, stability as a prerequisite for velocity, and transparency through tooling. Manage budget for areas of responsibility; ensure adherence to schedules, work plans, and performance requirements. Oversee remediation of audit findings and observations within areas of responsibility, ensuring root cause is addressed, residual risk is reduced, and remediation is completed timely. Maintain appropriate work/life balance within teams while upholding a high standard of delivery quality. Other duties as assigned.
Supervisory Responsibilities: Manages a team of engineering managers and senior technical staff across Reliability Engineering, Cloud Architecture, and Metrics & Reporting functions.

Benefits

A hybrid work environment, up to 2 days per week of remote work
Tuition Reimbursement to support your continued education
Student Loan Repayment Assistance
Technology Stipend allowing you to use the device of your choice to connect to our network while working remotely
Generous PTO and Parental leave
401k Employer Match
Competitive health benefits including medical, dental and vision

Stand Out From the Crowd

Upload your resume and get instant feedback on how well it matches this job.

Upload and Match Resume