Hunyuan Multimodal Reinforcement Learning Research Intern

Tencent•Palo Alto, CA

About The Position

Business Unit What the Role Entails Responsibilities: 1. Conduct research on RL algorithms for multimodal models, including diffusion models for image, video, and 3D generation, autoregressive models for multimodal understanding, and potentially unified multimodal frameworks. 2. Design and develop RL infrastructure and reward modeling strategies to enable efficient large-scale training, improve training stability, and mitigate reward hacking and related failure modes. 3. Explore next-generation RL paradigms that more directly and effectively learn from environment feedback. Who We Look For Requirements: 1. Currently enrolled as a PhD student in Computer Science or a closely related field. 2. Demonstrated strong research capability, with publications in top-tier conferences such as ICML, NeurIPS, ICLR, CVPR, ICCV, ECCV, SIGGRAPH. 3. Strong hands-on programming skills, with solid experience in deep learning system implementation, model training and inference optimization, CPU/GPU acceleration, and distributed training and inference. 4. Prior experience with diffusion models, autoregressive models, and/or text-to-image or text-to-video generation is highly preferred. 5. Participation in ACM/NOIP is a strong plus. Location State(s) US-California-Palo Alto The expected base pay range for this position in the location(s) listed above is $80,168.40 to $124,800.00 per year. Actual pay may vary depending on job-related knowledge, skills, and experience. This position will be eligible for 1 hour of paid sick leave for every 30 hours worked and up to 13 paid holidays throughout the calendar year. Subject to the terms and conditions of the applicable plans then in effect, full-time interns are also eligible to enroll in the Company-sponsored medical plan. Equal Employment Opportunity at Tencent As an equal opportunity employer, we firmly believe that diverse voices fuel our innovation and allow us to better serve our users and the community. We foster an environment where every employee of Tencent feels supported and inspired to achieve individual and common goals.

Requirements

Currently enrolled as a PhD student in Computer Science or a closely related field.
Demonstrated strong research capability, with publications in top-tier conferences such as ICML, NeurIPS, ICLR, CVPR, ICCV, ECCV, SIGGRAPH.
Strong hands-on programming skills, with solid experience in deep learning system implementation, model training and inference optimization, CPU/GPU acceleration, and distributed training and inference.

Nice To Haves

Prior experience with diffusion models, autoregressive models, and/or text-to-image or text-to-video generation is highly preferred.
Participation in ACM/NOIP is a strong plus.

Responsibilities

Conduct research on RL algorithms for multimodal models, including diffusion models for image, video, and 3D generation, autoregressive models for multimodal understanding, and potentially unified multimodal frameworks.
Design and develop RL infrastructure and reward modeling strategies to enable efficient large-scale training, improve training stability, and mitigate reward hacking and related failure modes.
Explore next-generation RL paradigms that more directly and effectively learn from environment feedback.

Benefits

This position will be eligible for 1 hour of paid sick leave for every 30 hours worked and up to 13 paid holidays throughout the calendar year.
Subject to the terms and conditions of the applicable plans then in effect, full-time interns are also eligible to enroll in the Company-sponsored medical plan.

Stand Out From the Crowd

Upload your resume and get instant feedback on how well it matches this job.

Upload and Match Resume