About The Position

At Rhoda AI, we're building the full-stack foundation for the next generation of humanoid robots — from high-performance, software-defined hardware to the foundational models and video world models that control it. Our robots are designed to be generalists capable of operating in complex, real-world environments and handling scenarios unseen in training. We work at the intersection of large-scale learning, robotics, and systems, with a research team that includes researchers from Stanford, Berkeley, Harvard, and beyond. We're not building a feature; we're building a new computing platform for physical work — and with over $400M raised, we're investing aggressively in the R&D, hardware development, and manufacturing scale-up to make that a reality. We're looking for Research Scientists and Research Engineers to push the frontier of large-scale pre-training for our video action model. Our approach formulates robot control as video prediction — we pre-train causal video generation models on web-scale video data, then adapt them to predict robot actions from real-world demonstrations. You'll work on the core architectures, training objectives, and scaling strategies that determine how well our models learn from internet-scale video. We hire across levels — from senior to staff — and welcome both research-track and engineering-track candidates.

Requirements

  • Strong background in large-scale generative modeling — either video generation (autoregressive video models, diffusion transformers, causal video architectures) or language model pretraining (LLMs, autoregressive transformers at scale)
  • Hands-on experience training large generative models from scratch at scale
  • Deep understanding of autoregressive modeling, causal architectures, and scaling behavior
  • Fluency with modern ML frameworks (PyTorch required; JAX a plus)
  • Ability to design experiments, interpret results, and iterate quickly
  • Strong research taste: ability to identify high-leverage questions and cut through noise
  • Comfort operating in a fast-moving, ambiguous startup environment
  • Staff-level candidates are expected to define technical direction and drive research strategy independently; senior/MTS candidates execute complex projects with strong fundamentals and growing scope

Nice To Haves

  • PhD in ML, CS, Robotics, or a related field — or equivalent research/industry experience
  • Strong publication record at NeurIPS, ICML, ICLR, CVPR, CoRL, etc. (especially valued for RS track)
  • Prior work specifically on video generation models (autoregressive video, diffusion transformers, world models, or causal video architectures)
  • Experience with large-scale autoregressive language model pretraining and scaling
  • Familiarity with web-scale video datasets and video data curation pipelines
  • Prior work connecting video generation to control, action prediction, or robotic learning
  • Familiarity with distributed training and multi-node infrastructure

Responsibilities

  • Design and train large-scale causal video generation models on web-scale video data
  • Develop and validate training objectives, model architectures, and data mixtures for video prediction at scale
  • Research scaling laws and data efficiency for web-scale video pretraining
  • Investigate what properties of web video transfer most effectively to robotic control and action prediction
  • Build systematic evaluations to measure video generation quality, long-horizon prediction fidelity, and downstream robot task performance
  • Run rigorous ablations and benchmarking to understand what drives model quality at scale
  • Collaborate closely with data & evaluation, post-training, and training systems teams to translate research ideas into working systems
  • Publish and present work at top-tier ML and robotics venues (especially valued for RS track)

Benefits

  • High ownership and fast iteration in a small, elite team
© 2026 Teal Labs, Inc
Privacy PolicyTerms of Service