Director of Cloud Infrastructure

SAP Taulia

22d•Remote

About The Position

We are seeking a Director of Operations to lead Taulia’s global Cloud Operations function, including Cloud Ops, Release Management, and the Network Operations Center (NOC). This role is accountable for the reliability, availability, performance, and operational excellence of Taulia’s production services across three global cloud data centers, enabling a ~170-person engineering organization to deliver safely and quickly to customers. Our cloud footprint is predominantly on Google Cloud Platform (GCP). You will lead a team of approximately 20 operations engineers distributed across regions and time zones, and you will set the standards for incident response, operational readiness, change/release governance, monitoring/observability, and continuous improvement. This is a highly cross-functional leadership role that partners closely with Development, Architecture, Security, Product, and Customer Support to ensure Taulia’s platform is stable, secure, scalable, cost-effective, and compliant. This role will also be responsible for operationalizing and instilling SAP infrastructure guidelines and ensuring ongoing PCI DSS compliance across our production environments and operating practices. Why Join Us At Taulia, Operations is a force multiplier for engineering and a critical driver of customer trust. In this role, you will: Own reliability and operational outcomes for a global fintech platform operating at enterprise scale. Lead and develop a strong, distributed team and modernize operating practices across Cloud Ops, Release, and the NOC. Shape how Taulia delivers change—improving deployment safety, reducing operational risk, and accelerating time-to-value. Drive measurable improvements across availability, incident response, observability, automation, cost optimization, and operational maturity. Help define and embed SAP-aligned infrastructure standards while ensuring compliance for regulated environments (including PCI DSS).

Requirements

10+ years in an operations leadership role (e.g., Cloud Operations, SRE/Production Engineering, Infrastructure Operations, NOC leadership), including responsibility for production availability and incident response, along with prior experience as an operations engineer and/or development engineer.
Experience leading globally distributed teams supporting 24x7 operations and on-call programs.
Proven track record owning operational outcomes for SaaS/cloud platforms, including incident management, change/release processes, and service reliability.
Strong experience with cloud infrastructure and operations (Google Cloud Platform preferred), including multi-region architecture and disaster recovery patterns.
Demonstrated experience operating in regulated or compliance-driven environments, including PCI DSS or similar control frameworks.
Demonstrated ability to partner effectively across Engineering, Security, and Product to balance delivery speed with operational risk and compliance requirements.
Experience implementing operational processes and governance (e.g., ITIL-inspired incident/problem/change management) in a pragmatic way.
Strong communication skills with the ability to lead under pressure, align stakeholders, and provide executive-ready reporting.
Production operations leadership: incident command, escalation management, service ownership, operational readiness.
Release management and change governance: risk management, dependency coordination, deployment controls.
Observability strategy: monitoring/alerting, logging, tracing, SLOs, dashboards, alert quality improvements.
Reliability engineering practices: automation, reducing toil, post-incident learning, resilience testing, DR planning.
Cloud fluency (GCP preferred): infrastructure, networking fundamentals, security considerations, capacity/performance planning.
Compliance and controls mindset: audit readiness, evidence-driven operations, PCI DSS-aligned operational practices.
Cost management / FinOps: cost visibility, forecasting, optimization, and governance for cloud environments.
Metrics-driven management: MTTR, availability, change failure rate, deployment frequency, operational KPI design.
People leadership: hiring, coaching, performance management, org design, operating cadence.

Responsibilities

Operational Ownership: Manage production operations across three global GCP data centers, prioritizing 24/7 availability, scalability, and resilience.
Standardization: Evolve operational standards, including runbooks, escalation paths, and operational readiness reviews (ORR).
Reliability Engineering: Collaborate with Engineering to define SLOs/SLAs, manage error budgets, and oversee capacity and disaster recovery planning.
Deployment Safety: Lead release planning and change control; partner with Engineering on progressive delivery and automated rollbacks.
Governance: Enforce go/no-go frameworks and ensure all changes meet SAP guidelines and PCI DSS standards.
Continuous Improvement: Drive blameless post-mortems to improve change success rates and system stability.
Incident Command: Lead major incident response (IR) and establish clear command structures for rapid resolution.
Root Cause Analysis (RCA): Ensure corrective actions are prioritized and executed to prevent systemic recurrence.
Reporting: Track and report on reliability trends, MTTR, and long-term remediation progress for executive leadership.
Strategic Tooling: Define the strategy for logging, tracing, and monitoring to ensure rapid issue detection.
Toil Reduction: Drive automation across provisioning and routine tasks to eliminate manual intervention and human error.
Data-Driven Culture: Utilize operational KPIs (MTTR, change failure rate, toil metrics) to guide process improvements.
Regulatory Alignment: Maintain audit readiness for PCI DSS and SAP infrastructure guidelines; manage access controls and evidence collection.
Cloud Economics: Implement FinOps practices to monitor and optimize cloud spend, focusing on right-sizing and waste reduction without sacrificing performance.
Vendor Management: Oversee key partnerships for infrastructure, monitoring, and CI/CD tooling.
Team Growth: Lead and coach a global team of ~20 engineers, defining career paths and performance expectations.
Culture: Foster an inclusive, high-performing environment with strong cross-functional rhythms across Security, Product, and Architecture.