Principal Software Engineer, ML System Architect

WaymoMountain View, CA
7hHybrid

About The Position

Waymo is an autonomous driving technology company with the mission to be the world's most trusted driver. Since its start as the Google Self-Driving Car Project in 2009, Waymo has focused on building the Waymo Driver—The World's Most Experienced Driver™—to improve access to mobility while saving thousands of lives now lost to traffic crashes. The Waymo Driver powers Waymo’s fully autonomous ride-hail service and can also be applied to a range of vehicle platforms and product use cases. The Waymo Driver has provided over ten million rider-only trips, enabled by its experience autonomously driving over 100 million miles on public roads and tens of billions in simulation across 15+ U.S. states. Waymo’s Systems Intelligence and ML team works with Research and Production teams to develop and deploy models that are core to our autonomous driving software. Waymo's AI is at the heart of this mission, and we are increasingly leveraging large-scale Foundation Models to unlock new capabilities for the Waymo Driver. Join Waymo to architect and build a unified, large-scale AI platform leveraging Google DeepMind's latest foundation models (like Gemini) for comprehensive world understanding and generation, to accelerate the development and distillation of models powering the world's most experienced driver. In this hybrid role, you will report to our Director of Engineering who leads Systems Intelligence and Machine Learning. We are seeking a deeply experienced Principal Software Engineer to provide the overarching technical vision, architectural design, and cross-team leadership to make Waymo’s foundation model systems nextgen a success. This role is pivotal in transforming Waymo's offboard ML landscape from a fragmented set of models and tools into a cohesive, efficient, and powerful platform centered around a unified foundation model recipe, deeply integrated with GDM's latest innovations with Gemini. You will be the technical authority defining how Waymo builds, trains, and utilizes these large models offboard to ultimately accelerate onboard deployment and improvements.

Requirements

  • Master's degree or PhD in Computer Science or a related field.
  • 12+ years of experience in software engineering, with at least 8+ years focused on large-scale machine learning systems, deep learning frameworks, and AI infrastructure.
  • A track record of architecting and delivering complex, high-impact ML platforms or models.
  • Deep expertise in Python, C++, and ML frameworks like JAX, Gemax, and TensorFlow.
  • Extensive experience with large-scale distributed training on TPUs/GPUs and associated challenges.
  • Demonstrated ability to design robust, scalable, and maintainable software architectures and APIs.
  • Understanding of data pipelines, storage systems, and tokenization techniques.
  • Experience working effectively with research and product teams, and influencing across organizational boundaries.
  • Technical leadership skills, with the ability to drive strategy, influence across teams, and mentor other engineers.
  • Communication skills, with the ability to articulate complex technical vision and drive alignment, capable of conveying complex technical ideas clearly.

Nice To Haves

  • Experience with multimodal and generative models.
  • Experience in autonomous vehicle systems or robotics.
  • Contributions to open-source ML frameworks or widely used internal tools.
  • Experience with simulation systems.

Responsibilities

  • Architect ML Systems: Define and drive the technical roadmap for the platform, encompassing codebase unification, data pipelines, model architecture, training recipes, and evaluation frameworks.
  • Codebase Consolidation & Best Practices: Lead the unification of existing forked locations of foundation model component codebases into a production-hardened, shared repository. Establish and enforce rigorous coding standards, testing practices, and API designs to ensure long-term codebase health and developer velocity.
  • GDM Integration & API Definition: Serve as the primary technical interface between Waymo's offboard model development and GDM's core model and framework teams. Define clear APIs and integration patterns, ensuring Waymo can seamlessly leverage and contribute to GDM's advancements while maintaining stability and control.
  • Unify Core Components: Drive the consolidation of tokenization/de-tokenization strategies, data formats, input pipelines, and evaluation methodologies across all offboard Foundation Model use cases.
  • Scalable Training & Distillation: Architect for efficient large-scale distributed training (large scale) and establish a common, efficient distillation setup to transfer knowledge from large teacher models to onboard student models.
  • Technical Leadership & Influence: Provide technical mentorship, guidance, and direction to engineers across multiple teams within SIML and AI Foundations. Drive alignment on technical decisions with senior stakeholders across Waymo and GDM.
  • Drive Efficiency: Instill a culture of efficiency in model development, training, and resource utilization, aiming for high ML Productivity Goodput.
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service