About The Position

As the Senior Director of SRE & Cloud Infrastructure, you will lead the teams responsible for the reliability, scalability, and cost-efficiency of our data security platform. You will own the infrastructure and operational foundations that power our engineering organization and customer-facing products, operating at massive scale with rigorous performance, reliability, and fault-tolerance capabilities. You’ll set strategy, grow and mentor teams, and still dive deep into architecture, incidents, and hard technical decisions. You’ll partner closely with Engineering, Product, Security, and Finance leadership to scale our infrastructure sustainably, manage COGS, and continuously improve developer and operational experience. You’ll play a key role in shaping our engineering culture, operational rigor, and AI-driven approach to reliability and efficiency as we scale. This role will report to the SVP of Engineering.

Requirements

  • You’ve led SRE and Infrastructure organizations at high-growth SaaS, platform, or security companies
  • You are a strong technical leader with deep experience in cloud-native systems and a strong SRE mindset
  • You have a strong background in Kubernetes, cloud platforms (GCP and/or AWS), and infrastructure as code (Terraform or equivalent)
  • You’ve designed or operated large-scale distributed systems, real-time data pipelines, or high-throughput platforms
  • You have experience owning COGS, cloud spend, and efficiency metrics, and can clearly communicate tradeoffs to executives
  • You’re comfortable operating at multiple levels: strategic planning, architectural reviews, and deep technical problem solving
  • You use data and metrics to drive reliability, performance, cost optimization, and team productivity
  • You have a proven track record of scaling teams and systems while maintaining high reliability and velocity
  • You’re an empathetic leader who fosters inclusion, ownership, accountability, and psychological safety
  • You thrive in fast-moving environments and are comfortable navigating ambiguity and change

Responsibilities

  • Lead, grow, and mentor high-performing globally distributed SRE and Infrastructure teams, including managers and senior ICs
  • Own the reliability, availability, scalability, and performance of our production and developer platforms
  • Define and execute the SRE and infrastructure strategy, including cloud architecture, Kubernetes platforms, CI/CD, and automation
  • Drive horizontal scaling and enable teams to operate independently, through decoupling and modularization of both architecture and processes
  • Drive infrastructure cost (COGS) optimization, capacity planning, and cloud financial management in close partnership with Finance and Engineering leadership
  • Establish and evolve SLOs, SLIs, error budgets, and operational best practices across the organization
  • Oversee incident management, postmortems, and continuous improvement, ensuring a strong culture of learning and ownership
  • Collaborate closely with security to ensure our infrastructure is secure, compliant, and resilient by design
  • Contribute to and uphold strong documentation, operational standards, and knowledge sharing across teams
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service