The role involves serving as a domain expert in computer vision and collaborating with researchers from other modalities to drive cutting-edge research in native multimodal foundation models. This includes novel architecture design and modeling for '2D + time' and '3D + time' scenarios. The position also requires exploring the training and design of large models for understanding and generating representations of the physical world, multimodal reasoning, and self-evolving continual learning. Staying up to date with the latest advancements in academia and industry is crucial, as is actively participating in international conferences and workshops and engaging with leading global research teams. Additionally, the role involves contributing impactful research outcomes to the open-source community or transferring technologies to internal product teams.
Stand Out From the Crowd
Upload your resume and get instant feedback on how well it matches this job.
Career Level
Senior
Industry
Broadcasting and Content Providers
Education Level
Master's degree
Number of Employees
5,001-10,000 employees