Wayve-posted 1 day ago
Full-time • Principal
Sunnyvale, CA
501-1,000 employees

As the Principal Engineer for the Model Development Platform at Wayve, you will own the end-to-end architecture that powers every aspect of our AI model lifecycle—from raw data ingestion to model training, experiment scheduling, and on-road testing. Sitting at the intersection of cutting-edge AI research, large-scale distributed systems, and robotic operations, you will ensure the reliability, scalability, and coherence of the systems that enable Wayve’s researchers and engineers to iterate rapidly and deploy autonomous driving models safely. You will partner closely with the Head of Model Dev Platform to define and execute the technical vision for the organization, aligning infrastructure and tooling with company-wide goals. You will lead by technical example—diving deep into complex challenges across web applications, distributed compute orchestration, ML Ops, data pipelines, and optimization algorithms. Your architectural insight and mentorship will empower teams to deliver world-class platform capabilities that measurably accelerate model development and fleet learning.

  • System Architecture & Reliability Design and evolve the overarching architecture of the model development platform, ensuring system-wide reliability, observability, and scalability. Define key performance, latency, and availability targets across diverse components and drive the engineering standards needed to achieve them.
  • Cross-Domain Technical Leadership Work across disciplines—from front-end web UIs to large-scale distributed training, from Spark-based data pipelines to experiment scheduling algorithms using linear optimization—to unify the platform’s architecture and ensure smooth interoperability between systems.
  • Hands-On Problem Solving Dive deep into the thorniest technical challenges faced by individual subteams, bringing your expertise in distributed systems, large-scale compute, and system design to bear. Drive architectural reviews and propose pragmatic solutions that balance innovation with operational simplicity.
  • Experimentation & Scheduling Systems Develop and refine systems that optimize how models are tested—whether in simulation or on-road—balancing constraints like hardware availability, safety requirements, and research priorities. Use algorithmic techniques (e.g., linear programming, heuristic optimization) to improve throughput and turnaround time.
  • Data & Compute Infrastructure Architect data processing pipelines capable of ingesting, transforming, and enriching petabytes of sensor data from the global fleet. Ensure efficient compute utilization across heterogeneous environments (GPU, CPU, cloud, and edge), supporting both rapid prototyping and large-scale production training.
  • Mentorship & Engineering Excellence Serve as a mentor and coach for engineers across the organization—developing technical talent, improving design practices, and fostering a culture of learning and technical excellence. Act as a trusted advisor to senior engineers and a role model for engineering craft.
  • Strategic Collaboration Partner with Product Management, Research, and Operations to align technical architecture with user needs and product vision. Co-develop the long-term roadmap for the Model Dev Platform, balancing innovation with reliability and maintainability.
  • Technical Leadership at Scale – 10+ years of experience designing and building large-scale distributed systems, ML/AI infrastructure, full stack web application, or developer platforms, including at least 3 years as a staff or principal-level engineer.
  • Architectural Depth & Breadth – Proven ability to design systems spanning web platforms, ML pipelines, and large-scale compute orchestration (e.g., Spark, Ray, Kubernetes, Airflow, MLflow).
  • Reliability & Performance Mindset – Experience driving platform reliability improvements, defining SLAs/SLOs, and building self-healing and observable systems that operate at “four nines” availability or better.
  • Hands-On Systems Design – Deep understanding of distributed computing, workflow orchestration, data modeling, and API design, with the ability to write and review production-quality code.
  • Collaborative Influence – Excellent communication and cross-functional collaboration skills; ability to guide engineers, managers, and researchers toward unified technical direction.
  • Mentorship & Culture – Demonstrated success in mentoring engineers across levels and cultivating a culture of engineering excellence.
  • Education – Bachelor’s degree in Computer Science, Software Engineering, or related field (advanced degree preferred, or equivalent experience).
  • Optimization & Scheduling Expertise – Experience applying algorithmic or mathematical optimization (e.g., linear programming, graph algorithms) to operational or scheduling problems.
  • ML Ops & Experimentation Systems – Familiarity with end-to-end model lifecycle tooling, from data ingestion and training CI to model artifact tracking and evaluation workflows.
  • Domain Experience – Prior exposure to autonomous systems, robotics, or other safety-critical domains.
  • Full-Stack Fluency – Experience with modern web frameworks (e.g., React, Flask, FastAPI) and how they integrate into backend systems.
  • Data Governance – Understanding of data privacy, compliance, and secure handling practices for large-scale sensor data.
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service