Zoox's internship program offers hands-on experience with cutting-edge technology, mentorship from some of the industry's brightest minds, and the opportunity to make meaningful contributions to real projects. The program seeks interns who demonstrate strong academic performance, engagement beyond the classroom, intellectual curiosity, and a genuine interest in Zoox's mission. During this internship, the intern will lead the development of a multi-modality (vision, LiDAR, Radar, and language), temporal foundation encoder to support 3D object detection & tracking, 3D segmentation (occupancy), and live maps. This Multi-Modal Foundation Encoder (MMFE) is a critical key to achieving End-to-End Perception at Zoox. The research will aim to significantly improve system performance on long-tail events and rare classes by utilizing a large-capacity foundation model to learn rich representations across different sensor modalities. Additionally, the project aims to improve perception in adverse environmental conditions (such as medium to heavy rain and fog, reducing false positives on water splashes or dust particles), achieve long-range sensing for highway driving, and build robustness to occlusion. This is a highly research-driven role with the goal of publication, offering the opportunity to explore novel directions such as tri-modal foundation models with self-supervised pre-training, radar-language grounding for zero-shot detection, efficient sensor fusion via sparse cross-attention, or integrating 3D Gaussian Splats for dynamic agent geometry and streaming sparse Gaussian occupancy prediction.
Stand Out From the Crowd
Upload your resume and get instant feedback on how well it matches this job.
Job Type
Full-time
Career Level
Intern
Education Level
Ph.D. or professional degree
Number of Employees
501-1,000 employees