Senior Machine Learning Engineer - 3D Segmentation

Zoox•Boston, MA

About The Position

The Perception team at Zoox is responsible for the robotâs understanding of the world, fusing data from Lidar, Radar, and Cameras to create a unified representation of the environment. In this role, you will contribute to the development of our next-generation 3D occupancy and segmentation networks. You will architect and optimize high-performance deep learning models that generate dense, temporally consistent voxel representations of the driving environment. This work is critical for enabling our vehicle to navigate complex urban scenarios, handle rare obstacles, and drive safely in tight spaces by providing precise geometry and motion estimates to downstream planners.

Requirements

MS or PhD in Computer Science, Robotics, Machine Learning, or related field with 6+ years of industry experience.
Deep expertise in 3D Computer Vision and Deep Learning, specifically with voxel-based or BEV (Bird's Eye View) architectures.
Strong proficiency in Python and deep learning frameworks (PyTorch) for model training and design as well as some experience in C++ for model integration.
Experience with multi-sensor fusion (Lidar, Camera, Radar) and handling temporal data sequences.
Experience with occupancy networks, implicit representations (NeRF/Gaussian Splats), or scene flow estimation.

Nice To Haves

Experience optimizing models for TensorRT/CUDA to achieve low-latency inference.
Familiarity with sparse convolutions or query-based architectures for efficient 3D processing.
Experience with Vision Language Model or multi-modal 3D foundation model.

Responsibilities

Design and implement state-of-the-art multi-modal sensor fusion architectures (Lidar, Camera, Radar) to predict 3D occupancy, semantic segmentation, and flow .
Develop "vision-first" fusion strategies to enhance geometric understanding and reduce dependency on sparse sensor modalities .
Engineer temporal processing modules to improve the stability and consistency of predictions over time.
Optimize model architectures for real-time on-vehicle inference, balancing high-fidelity range extension with strict latency constraints .
Collaborate with downstream consumers (Tracking, Prediction, Planner) to refine geometric outputs, such as contours and free-space estimations, for complex maneuvering.