Research Scientist, Spatial AI & Perception

Boston Dynamics•Waltham, MA

2d•$177,000 - $225,000•Onsite

About The Position

As a Spatial AI Research Scientist on the Atlas VLA Research team, you will build the perception and geometric reasoning systems that give Atlas a grounded 3D understanding of the world. Your work spans the full spectrum from real-time SLAM and state estimation on humanoid hardware to offline reconstruction pipelines that produce the geometric scene structure used to train and condition large VLM/VLA models. You will design real-time SLAM and perception-based state estimation that runs on Atlas, develop offline 3D reconstruction pipelines that turn teleop and robot logs into high-fidelity geometric data, and pursue research in spatial AI, grounding language and vision into 3D geometry so that learned policies can reason about space, not just pixels. You'll collaborate closely with perception, robotics, ML & system software specialists and rapidly test your work on state-of-the-art hardware.

Requirements

PhD in Robotics, Computer Vision, Machine Learning, Computer Science, or related fields (or equivalent research experience).
Prior experience building, and deploying SLAM, visual odometry, or 3D reconstruction systems for robots or autonomous vehicles.
Strong background in one or more of the following: Real-time SLAM, visual-inertial odometry, and state estimation
3D reconstruction (SfM, MVS, multi-view geometry, neural/implicit reconstruction)
Probabilistic state estimation and sensor fusion (factor graphs, filtering, optimization on manifolds)
Spatial representations, grounding language/vision into 3D geometry, geometric foundation models
Solid foundation in the math underlying geometric perception (Lie groups, nonlinear optimization, multi-view geometry).
Strong analytical and debugging skills; ability to write reliable, well-structured research code in C++ and Python.

Nice To Haves

Experience with modern ML frameworks (PyTorch, JAX) and an understanding of how perception outputs feed large-scale model training.
Experience building reconstruction or data pipelines that produce training data for large vision or VLA models.
Familiarity with VLA / large behavior models and how spatial grounding improves manipulation and long-horizon behavior.
Publications in top-tier computer vision, ML, or robotics conferences (e.g., CVPR, ICCV, ECCV, RSS, ICRA, CoRL).

Responsibilities

Design and implement real-time SLAM and perception-based state estimation for a mobile humanoid or specialized data collection devices operating in unstructured, dynamic environments
Build offline 3D reconstruction pipelines (multi-view geometry, SfM/MVS, neural reconstruction, depth/pose fusion) that generate geometric scene structure to inform and supervise large VLM/VLA training
Pioneer research integrating large VLA and VLM models with 3D spatial perception to enable semantic, language-grounded scene reasoning.
Bridge classical geometric methods and learned approaches - knowing when to use optimization-based estimation versus learned representations, and how to combine them.
Write high-quality, maintainable C++ and Python code that fits into a large production codebase.