Machine Learning: Multimodal Foundation Models

The Bot Company•San Francisco, CA

About The Position

We're building a helpful robot for every home. We're a small team of engineers, designers, and operators based in San Francisco. Our team comes from Tesla, Cruise, OpenAI, Google, Pixar, and many other great companies. In the past we've shipped to hundreds of millions of users and know what it takes to build amazing products and experiences. Our team is deliberately lean to promote rapid decision making and do away with bureaucracy and hierarchy. Everyone is an IC and is empowered with massive scope, radical ownership, and direct responsibility. We work across the stack with a culture built for rapid iteration and fast execution. We are building unified foundation models that natively reason across text, image, video, and kinematics to drive intelligent robotic policies. You will work on large multi-modal networks and own the entire stack from data to training and deploying models.

Requirements

Very strong coding skills in Python, C++, or Rust.
Production MLLM Experience: Track record of training and deploying large-scale multimodal models.
Pretraining & RL Mastery: Deep intuition for LLM-style pretraining, post-training, and Reinforcement Learning at scale.
Infrastructure Fluency: Comfortable managing and optimizing large-scale experiments on massive GPU clusters.

Responsibilities

Build Native Multimodal Policies: Develop architectures where vision, language, and more modalities share a unified representation.
Improve Cross-Modal Reasoning: Research and implement methods to ensure the model doesn't just "associate" modalities but actually reasons through them (e.g., grounding visual physics in kinematic constraints).
Own the Training Loop End-to-End: Design, run, debug, and iterate on large-scale training experiments; diagnosing failure modes, improving data mixtures, and tightening evaluation to drive measurable gains.
Ship and Iterate on Real Systems: Integrate models into real robotic stacks, build on robot code to deploy your models, and optimize performance for edge inference.

Stand Out From the Crowd

Upload your resume and get instant feedback on how well it matches this job.

Upload and Match Resume