Tesla is solving robust embodied intelligence through humanoid robots. You will join the team building the vision and multimodal foundation models that allow Optimus to understand, interact, and navigate the world. Working at the intersection of large-scale deep learning and real-world robotics, you will own the full model lifecycle - from designing massive-scale data engines and auto labeling pipelines to architecting novel neural networks and deploying them to thousands of robots. Design and build scalable data pipelines, utilizing classical computer vision and off-the-shelf models to auto label massive video datasets Develop and train state-of-the-art foundational vision and multimodal networks, focusing on efficient tokenization, compression, and fusion of vision, language, audio, and tactile data Iterate on neural network architectures to improve perception, 3D understanding, and world modeling capabilities Optimize models for deployment on edge compute, utilizing quantization, pruning, and distillation to ensure real-time performance on the robot Collaborate with hardware and controls engineers to close the loop between perception and actuation in the real world
Stand Out From the Crowd
Upload your resume and get instant feedback on how well it matches this job.
Job Type
Full-time
Career Level
Mid Level
Education Level
No Education Listed
Number of Employees
5,001-10,000 employees