Sr. Director, Platform Engineering & SRE

Ministry Brands•Alpharetta, GA

About The Position

As Sr. Director, Platform Engineering & SRE, you will build and lead the function responsible for the reliability, performance, and operational excellence of the Ministry Brands platform. You will own site reliability engineering, observability, production operations, and cloud engineering across our multi-cloud SaaS portfolio — establishing the practices, tooling, and standards that keep our products available and performant for the organizations we serve. This is a hands-on leadership role at the center of our most important technical priority: platform stability. You will define and drive measurable improvements in availability and incident response, stand up a modern SRE discipline, and partner closely with R&D, Product, and Security leaders to embed reliability into how we build and operate software. You will be accountable to executive leadership for platform availability and performance.

Requirements

Bachelor’s degree in Computer Science, Engineering, or a related field, or equivalent experience
10+ years of overall experience in software, infrastructure, or platform engineering
6+ years in engineering leadership or management roles
Demonstrated track record building or scaling a Site Reliability Engineering or Platform Engineering function and improving availability/reliability outcomes
Deep, hands-on cloud experience at SaaS scale (Azure and/or AWS), including infrastructure-as-code and CI/CD
Strong background across: Site Reliability Engineering, Observability & Monitoring, Cloud & Infrastructure Engineering, Incident & Performance Management, Capacity Planning, and Production Operations

Nice To Haves

Experience operating multi-cloud and multi-tenant SaaS environments
Hands-on implementation of SLO/error-budget frameworks and modern observability tooling (e.g., Datadog, Grafana, Prometheus, OpenTelemetry)
Experience standing up reliability in a distributed or embedded (product-team) model
Exposure to SOC 2 and PCI DSS 4.0.1 control evidence at the infrastructure layer
Background in a private-equity-backed or high-growth SaaS environment
Demonstrated business acumen and sound decision-making in complex, multi-product environments

Responsibilities

Establish and own service-level objectives (SLOs), service-level indicators (SLIs), and error-budget policy across the product platform
Lead incident command, on-call rotation, escalation, and a blameless postmortem culture; drive measurable reduction in MTTR and change-failure rate
Set reliability standards and partner with embedded reliability engineers in R&D product teams to apply them at the point of system design
Drive availability toward enterprise targets and own the reliability roadmap and its reporting to executive stakeholders
Build and operate the observability platform — metrics, logs, traces, and alerting — and define the golden signals and dashboards used across products
Lead capacity planning, performance engineering, and operational-readiness reviews for new and existing services
Own production operations practices, runbooks, and escalation workflows that improve transparency, stability, and stakeholder communication
Deliver metrics-based reporting on platform availability and performance
Lead cloud engineering across our multi-cloud footprint (Azure, AWS, GCP), balancing reliability, performance, security posture, and cost
Own infrastructure-as-code, CI/CD platform standards, and the internal developer platform that product teams build on
Drive consolidation and standardization of fragmented infrastructure and pipeline tooling
Partner with Security to implement and evidence platform-layer controls in support of SOC 2 and PCI DSS objectives
Define team culture and objectives aligned to Enterprise IT & Security strategic goals; build, coach, and develop the Platform Engineering & SRE team
Build and maintain strong partnerships with R&D, Product, Security, and IT leaders
Develop and manage the platform engineering budget, balancing run-the-business needs with strategic investment, and author clear business cases for technology investments
Manage key cloud and tooling vendor relationships in partnership with IT and Procurement
Present updates, metrics, and recommendations to both technical and business stakeholders

Benefits

Robust healthcare options – Options include a plan that is 100% covered by Ministry Brands for employee only coverage as well as a generous HSA contribution by the company.
Flexible paid time off
Flexible work schedules
PTO for vacation
Up to 80 hours of paid sick/safe leave
11.5 days of fully paid holidays
Paid parental leave
Mental health support through an Employee Assistance Program
Professional development reimbursement
Employee Recognition & Rewards through Nectar