Research Engineer/Research Scientist- Efficient Modeling

Rhoda ai•Palo Alto, CA

46d

About The Position

At Rhoda AI, we're building the full-stack foundation for the next generation of humanoid robots — from high-performance, software-defined hardware to the foundational models and video world models that control it. Our robots are designed to be generalists capable of operating in complex, real-world environments and handling scenarios unseen in training. We work at the intersection of large-scale learning, robotics, and systems, with a research team that includes researchers from Stanford, Berkeley, Harvard, and beyond. We're not building a feature; we're building a new computing platform for physical work — and with over $400M raised, we're investing aggressively in the R&D, hardware development, and manufacturing scale-up to make that a reality. We're looking for a Research Scientist or Research Engineer focused on model efficiency — making our foundation world models faster, smaller, and more deployable without sacrificing capability. This work is critical to closing the gap between research-scale models and real-time operation on robot hardware.

Requirements

Strong understanding of model compression and efficient architectures for large models
Hands-on experience with quantization, distillation, or pruning applied to transformers or large neural networks
Deep knowledge of where efficiency gains are possible in modern architectures
Proficiency with PyTorch and familiarity with hardware-aware optimization (CUDA, TensorRT, or similar)
Ability to run principled experiments that characterize capability-efficiency tradeoffs

Nice To Haves

PhD in ML, CS, or a related field — or equivalent research/engineering experience
Publication record at NeurIPS, ICML, ICLR, MLSys, or related venues
Experience with efficient video or multimodal model architectures
Familiarity with edge deployment targets (Jetson, custom ASICs, or mobile hardware)
Prior work on speculative decoding, early exit, or adaptive compute
Experience deploying compressed models on physical robots or latency-constrained systems

Responsibilities

Research and implement model compression techniques: quantization, pruning, structured sparsity, distillation, and low-rank approximation
Design efficient architectures and attention mechanisms suited to real-time inference on edge and robot hardware
Develop training strategies that produce better accuracy-efficiency tradeoffs from the start
Profile and benchmark models across hardware targets to identify and resolve efficiency bottlenecks
Build evaluation frameworks that measure capability retention after compression or architecture changes
Collaborate with training systems and deployment teams to ensure efficient models translate to faster real-world inference
Publish and present work at top-tier venues (especially valued for RS track)

Stand Out From the Crowd

Upload your resume and get instant feedback on how well it matches this job.

Upload and Match Resume