Senior/Staff Software Engineer - Machine Learning & System Optimization

Zoox•Boston, MA

About The Position

The Perception team is pioneering the development of a multi-modality foundation model to drive the next generation of autonomous system intelligence. As a Machine Learning and System Optimization Engineer, you will orchestrate and allocate overall system capacity to various core perception models running on-bot, as well as drive large initiatives that allow for more efficient inference by sharing various parts of the perception stack with one another. You will focus on bringing highly efficient, production-ready large-scale models to our on-vehicle stack. We are looking for experts with hands-on experience compressing, accelerating, and deploying complex models, including LLMs, VLMs, or foundation models, for power- and thermal-constrained vehicle SoCs. In addition, you will optimize ML models, write custom CUDA kernels, and build highly concurrent inference code to ensure real-time, deterministic execution on edge devices.

Requirements

Deep experience in system and performance optimization in CPU/GPU systems designed for low latency or high throughput.
Deep expertise in working with real-time systems & required constraints such as processing latency, memory utilization, and memory bandwidth pressure.
Deep expertise in model quantization (PTQ, QAT) and mixed-precision inference frameworks (INT8, FP8, FP4, BF16/FP16).
Proficiency in low-level programming for AI accelerators, specifically developing and optimizing custom ML OPs and TensorRT Plugins with efficient CUDA kernel implementations.
Production-level C++ (14/17/20) and Python programming skills, with experience developing concurrent, memory-safe, real-time inference code for edge devices.

Nice To Haves

Prior experience in high-performance robotics applications such as AV/drones/robots.
Familiarity with SOTA autonomous driving perception algorithms (temporal 3D object detection, BEV, 3D Occupancy Networks) and multi-modal sensor processing (Vision, LiDAR, Radar).
Experience with end-to-end autonomous driving paradigms (VLM/VLA models, Foundation models) and edge deployment technologies (e.g., TensorRT-LLM).

Responsibilities

Allocate and distribute system resources (CPU/GPU/interconnect) to various models and inference engines running on the robot.
Spearhead cross-cutting initiatives that allow for better compute utilization through sharing/fusing models and better scheduling strategies.
Optimize large-scale models (Multi-Modal Sensor Fusion models, LLMs, VLMs) using advanced quantization (PTQ, QAT), pruning, mixed-precision inference frameworks, and parameter-efficient fine-tuning (LoRA, QLoRA).
Architect and implement model conversion and compilation pipelines using TensorRT for edge deployment.
Write production-level, low-latency, and memory-safe C++ and CUDA code for real-time inference on vehicle systems.