MERL is seeking a highly motivated and qualified intern to conduct research on applying foundation models to robotic manipulation. The focus will be on leveraging large-scale pretrained models (e.g., vision-language models, multimodal transformers, diffusion policies) to enable generalist manipulation capabilities across diverse objects, tasks and embodiments including humanoids. Potential research topics include few-shot policy learning, multimodal grounding of multiple sensor modalities to robot actions, and adapting foundation models for precise control and high success rate. Experience in working with humanoids is The ideal candidate will be a senior Ph.D. student with a strong background in machine learning for robotics, particularly in areas such as foundation models, imitation learning, reinforcement learning, and multimodal perception. Knowledge on large-scale Vision-Language-Action (VLA) and multimodal foundation models is expected. The internship will involve algorithm design, model fine-tuning, simulation experiments, and deployment on physical robot platforms equipped with cameras, tactile sensors, and force/torque sensors. The successful candidate will collaborate closely with MERL researchers, with the expectation of publishing in top-tier robotics or AI conferences/journals. Interested candidates should apply with an updated CV and relevant publications.
Stand Out From the Crowd
Upload your resume and get instant feedback on how well it matches this job.
Career Level
Intern
Education Level
Ph.D. or professional degree
Number of Employees
5,001-10,000 employees