About The Position

At Rhoda AI, we're building the full-stack foundation for the next generation of humanoid robots — from high-performance, software-defined hardware to the foundational models and video world models that control it. Our robots are designed to be generalists capable of operating in complex, real-world environments and handling scenarios unseen in training. We work at the intersection of large-scale learning, robotics, and systems, with a research team that includes researchers from Stanford, Berkeley, Harvard, and beyond. We're not building a feature; we're building a new computing platform for physical work — and with over $400M raised, we're investing aggressively in the R&D, hardware development, and manufacturing scale-up to make that a reality. We're looking for a Research Scientist or Research Engineer focused on model efficiency — making our foundation world models faster, smaller, and more deployable without sacrificing capability. This work is critical to closing the gap between research-scale models and real-time operation on robot hardware.

Requirements

  • Strong understanding of model compression and efficient architectures for large models
  • Hands-on experience with quantization, distillation, or pruning applied to transformers or large neural networks
  • Deep knowledge of where efficiency gains are possible in modern architectures
  • Proficiency with PyTorch and familiarity with hardware-aware optimization (CUDA, TensorRT, or similar)
  • Ability to run principled experiments that characterize capability-efficiency tradeoffs

Nice To Haves

  • PhD in ML, CS, or a related field — or equivalent research/engineering experience
  • Publication record at NeurIPS, ICML, ICLR, MLSys, or related venues
  • Experience with efficient video or multimodal model architectures
  • Familiarity with edge deployment targets (Jetson, custom ASICs, or mobile hardware)
  • Prior work on speculative decoding, early exit, or adaptive compute
  • Experience deploying compressed models on physical robots or latency-constrained systems

Responsibilities

  • Research and implement model compression techniques: quantization, pruning, structured sparsity, distillation, and low-rank approximation
  • Design efficient architectures and attention mechanisms suited to real-time inference on edge and robot hardware
  • Develop training strategies that produce better accuracy-efficiency tradeoffs from the start
  • Profile and benchmark models across hardware targets to identify and resolve efficiency bottlenecks
  • Build evaluation frameworks that measure capability retention after compression or architecture changes
  • Collaborate with training systems and deployment teams to ensure efficient models translate to faster real-world inference
  • Publish and present work at top-tier venues (especially valued for RS track)
© 2026 Teal Labs, Inc
Privacy PolicyTerms of Service