AI Robotics Researcher Intern (Dexterous Manipulation)

NIO•San Jose, CA

1d•$38 - $46

About The Position

We are looking for an outstanding AI Robotics Research Intern to join the team at NIO. This role operates at the cutting edge of embodied AI and dexterous manipulation, with a specific focus on utilizing large-scale foundation models and human data-based learning to empower robots with physical world intelligence. As an intern, you will tackle the fundamental challenges of dexterous manipulation by harvesting human-object interaction data from diverse sources—ranging from unstructured web videos to high-fidelity human glove-collected data. Your work will involve translating these rich human insights into executable robotic behaviors, bridging the gap between human dexterity and machine execution. You will be responsible for deploying these policies on real hardware, to perform complex, contact-rich tasks in real-world environments

Requirements

Master’s or Ph.D. in Robotics, Computer Science, Artificial Intelligence, Mechanical/Electrical Engineering, or related fields.
Strong technical foundation in robot learning and control, including areas such as reinforcement learning, imitation learning, world modeling, or representation learning for agent-environment interactions.
Practical experience implementing and fine-tuning Generative Models and Transformer architectures.
Hands-on experience with robotic manipulation systems, particularly involving contact-rich interaction, grasping, or multi-sensor perception (e.g., tactile, force/torque, proprioception).
Proficiency in Python and modern ML frameworks (e.g., PyTorch, JAX, TensorFlow), with experience using robotics middleware or simulation tools (e.g., ROS/ROS2, MuJoCo, Isaac Sim, PyBullet).
Demonstrated ability to implement, experiment, and iterate on research ideas, including evaluating methods through empirical results on simulated or physical robotic systems.
Strong analytical and system-building skills, with the ability to work across simulation, learning, perception, control, and real robot deployment as part of a larger technical team.

Nice To Haves

Ph.D. (or Ph.D. candidate expecting graduation within 6–12 months).
Prior experience with dexterous manipulation, multi-finger robotic hands, in-hand manipulation, or grasp optimization beyond parallel-jaw grasping.
Experience deploying learning-based policies on real robotic hardware, including exposure to sim-to-real transfer challenges such as contact mismatch, compliance, sensing noise, or latency.
Familiarity with contact modeling, tactile sensing, force/torque feedback, or low-level control interfaces for manipulation.
Background in 3D perception, geometric representations, or learned representations relevant to physical interaction.
Experience with reinforcement learning in continuous control, model-based methods, or real-time policy execution.
A strong interest in building robust, real-world robotic systems, and motivation to see research ideas validated through physical experiments rather than simulation alone.
Track record of publications in top AI or robotics conferences (CoRL, ICRA, IROS, RSS, NeurIPS, CVPR, ICML).

Responsibilities

Learning from Human Demonstrations: Develop and refine scalable frameworks for the transfer of human-object interaction skills to diverse robotic embodiments.
Large-Scale Data Synthesis: Architect autonomous pipelines to process vast amounts of visual data and human glove-collected data, extracting the spatial and contact-rich information necessary for generalist robot training.
Generative Embodied AI: Implement state-of-the-art generative architectures to synthesize physically grounded, high-fidelity trajectories based on human reference motions.
Unified Policy Training: Explore cross-embodiment representations that enable joint training on human and robot data to improve generalization in unstructured environments.
Sim-to-Real Deployment: Research and optimize distillation and retargeting techniques to bridge the gap between simulation-trained policies and physical robotic deployment.
Semantic Scene Understanding: Utilize vision-language foundation models to autonomously segment skills and extract task-relevant parameters from complex human activities.

Stand Out From the Crowd

Upload your resume and get instant feedback on how well it matches this job.

Upload and Match Resume