Vision Researcher - Multimodal Understanding & Generation in Foundation Models

Tencent•Bellevue, WA

70d•$149,000 - $279,800

About The Position

The role involves serving as a domain expert in computer vision and collaborating with researchers from other modalities to drive cutting-edge research in native multimodal foundation models. This includes novel architecture design and modeling for '2D + time' and '3D + time' scenarios. The position also requires exploring the training and design of large models for understanding and generating representations of the physical world, multimodal reasoning, and self-evolving continual learning. Staying up to date with the latest advancements in academia and industry is crucial, as is actively participating in international conferences and workshops and engaging with leading global research teams. Additionally, the role involves contributing impactful research outcomes to the open-source community or transferring technologies to internal product teams.

Requirements

Master's or Ph.D. degree in Computer Science, Artificial Intelligence, Computer Vision, Machine Learning, or a related field.
Proven multi-modal research experience in relevant areas, with familiarity with state-of-the-art technologies and a strong publication record in top-tier conferences or journals such as CVPR, ICCV, ECCV, NeurIPS, ICLR, or ICML.
Proficiency with mainstream open-source tools and frameworks relevant to the field, and strong engineering skills to support research implementation; candidates with influential GitHub projects or contributions to high-impact open-source communities are preferred.
Strong team spirit and ability to collaborate across disciplines, excellent communication skills, intellectual curiosity, and a goal-oriented, problem-solving mindset.

Responsibilities

Serve as a domain expert in computer vision and collaborate with researchers from other modalities to drive cutting-edge research in native multimodal foundation models, including novel architecture design and modeling for '2D + time' and '3D + time' scenarios.
Explore the training and design of large models for understanding and generating representations of the physical world, multimodal reasoning, and self-evolving continual learning.
Stay up to date with the latest advancements in academia and industry; actively participate in international conferences and workshops, and engage with leading global research teams.
Contribute impactful research outcomes to the open-source community or transfer technologies to internal product teams.

Benefits

Sign on payment eligibility
Relocation package eligibility
Restricted stock units eligibility
Medical benefits
Dental benefits
Vision benefits
Life and disability benefits
Participation in the Company's 401(k) plan
15 to 25 days of vacation per year (depending on tenure)
Up to 13 days of holidays throughout the calendar year
Up to 10 days of paid sick leave per year

Stand Out From the Crowd

Upload your resume and get instant feedback on how well it matches this job.

Upload and Match Resume

What This Job Offers

Career Level

Senior

Industry

Broadcasting and Content Providers

Education Level

Master's degree

Number of Employees

5,001-10,000 employees

Vision Researcher - Multimodal Understanding & Generation in Foundation Models

About The Position

Requirements

Responsibilities

Benefits

What This Job Offers

Job Search Resources

Tools

Career Hubs

Guides

Company