Principal Engineer, Data & Compute

WayveSunnyvale, CA
10hHybrid

About The Position

At Wayve, we are teaching machines to drive—not by coding rules, but by training end-to-end neural networks that learn from vast streams of real-world data. Achieving this requires unprecedented scale in both data infrastructure and compute orchestration. Our workloads span thousands of GPUs, petabytes of driving data, and geographically distributed training and inference clusters. As Architect for AI Infrastructure, you will design and guide the evolution of the foundational compute and storage systems that fuel our model development lifecycle. Your leadership will directly accelerate AI research, enable rapid model deployment, and ensure our platform meets the demands of a company pushing the boundaries of autonomy. You’ll sit at the strategic core of AI, systems, and cloud infrastructure—owning challenges that few companies have the ambition or scale to tackle.

Requirements

  • 10+ years designing and building large-scale distributed systems, with at least 4 years focused on GPU-based cloud infrastructure.
  • Proven experience enabling large-scale AI training, inference, or computer vision workloads in GPU clusters.
  • Deep understanding of petabyte-scale data architecture, including storage federation, high-throughput access, and data locality for AI workloads.
  • Strong technical leadership with a track record of defining and communicating architectural strategy, balancing long-term vision with delivery needs.
  • A natural mentor with a history of developing engineers and influencing technical direction across teams.
  • Advanced degree in Computer Science, Electrical Engineering, or a related field—or equivalent industry experience.

Nice To Haves

  • Experience with multi-cloud orchestration, particularly in latency- or cost-sensitive training and inference pipelines.
  • Familiarity with systems like Ray, Kubernetes, Airflow, or Flyte, and deep fluency in AI/ML job scheduling, model lifecycle management, and infrastructure-as-code practices.
  • Background in supporting safety-critical or real-time inference use cases (e.g., robotics, autonomous vehicles, aerospace).
  • Passion for building infrastructure-as-a-product that delivers performance and simplicity to research and product teams alike.

Responsibilities

  • Global Compute Strategy – Define and evolve the architecture for how Wayve allocates and orchestrates training and inference workloads across thousands of GPUs and multiple data centers, ensuring optimal throughput, resiliency, and cost efficiency.
  • Petabyte-Scale Data Federation – Design systems that enable fast, reliable access to high-volume sensor and simulation data across geographies, ensuring the right data is always available for training, evaluation, and inference. Furthermore, preparing Wayve for being an exabyte-scale company.
  • Cross-Region GPU Job Execution – Build the foundations that enable large-scale AI workloads to run seamlessly across hybrid and multi-cloud environments.
  • Cloud Infrastructure Advisory – Act as a trusted partner to leadership in aligning compute investments and architecture with company strategy, growth plans, and performance goals.
  • Technical Leadership & Mentorship – Uplift the broader engineering org through architectural coaching, technical deep dives, and by cultivating a culture of operational and engineering excellence.
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service