VP of Engineering

Hyperbolic LabsSan Francisco, CA

About The Position

We are seeking a highly technical Vice President of Infrastructure to build and scale the foundational infrastructure powering our AI cloud platform. This is a hands-on executive leadership role. While you will own infrastructure strategy, organizational growth, and executive-level decision making, we expect you to remain deeply engaged in architecture, design, and engineering execution. You should expect to spend approximately 30-40% of your time directly contributing to technical design, architecture reviews, debugging critical production issues, and partnering with engineers on implementation. The ideal candidate has previously built and scaled cloud platforms, preferably GPU-native cloud infrastructure supporting AI training and inference workloads. You have experience operating at the intersection of executive leadership and hands-on engineering and are excited to help build both the technology and the team.

Requirements

  • 12+ years building and operating large-scale infrastructure systems
  • Experience leading infrastructure organizations while remaining hands-on technically
  • Previous experience building or operating a cloud platform at scale
  • Experience building GPU infrastructure or AI/ML compute platforms
  • Proven track record scaling infrastructure in high-growth startup environments
  • Expert-level Kubernetes knowledge
  • Experience designing and operating multi-region cloud infrastructure
  • Strong understanding of Linux, networking, distributed systems, and storage architecture
  • Experience with Infrastructure-as-Code and automation frameworks
  • Deep expertise in observability, monitoring, and reliability engineering
  • Experience building highly available production systems

Nice To Haves

  • Experience with GPU scheduling, Slurm, Kubernetes GPU operators, Ray, or distributed training systems
  • Experience managing thousands of GPUs in production environments
  • Background supporting AI training and inference platforms

Responsibilities

  • Lead the design and evolution of our AI cloud platform
  • Define the architecture for GPU orchestration, compute scheduling, networking, storage, and distributed systems
  • Make critical decisions regarding cloud infrastructure, bare-metal deployments, and platform scalability
  • Personally participate in architecture reviews and key technical initiatives
  • Build and scale large GPU clusters supporting customer workloads
  • Design systems for GPU provisioning, scheduling, utilization optimization, and capacity management
  • Drive platform reliability and performance for AI training and inference workloads
  • Partner closely with engineering teams on infrastructure requirements for next-generation AI systems
  • Remain deeply involved in engineering decisions and technical direction
  • Contribute directly to infrastructure design and implementation efforts
  • Review architecture proposals, system designs, and major infrastructure changes
  • Act as the technical escalation point for complex infrastructure challenges
  • Establish best practices for Kubernetes, observability, CI/CD, security, and operational excellence
  • Build SRE and Platform Engineering functions from the ground up
  • Define reliability standards including SLOs, SLIs, incident response processes, and capacity planning
  • Drive automation across infrastructure operations
  • Recruit and develop world-class Infrastructure, Platform, and SRE teams
  • Build a high-performance engineering culture focused on ownership and execution
  • Partner with executive leadership on company strategy and infrastructure investments
  • Manage infrastructure budgets, vendor relationships, and capacity planning
© 2026 Teal Labs, Inc
Privacy PolicyTerms of Service