Business Unit Technology Engineering Group (TEG) is responsible for supporting the company and its business groups on technology and operational platforms, as well as the construction and operation of R&D management and data centers, TEG provides users with a full range of customer services. As the operator of the largest networking, devices, and data center in Asia,TEG also leads the Tencent Technology Committee in strengthening infrastructure R&D through internal and distributed open source collaboration, constructing new platforms and supporting business innovation. What the Role Entails About Tencent AI Lab at Seattle Area Tencent is a leading internet company in China. Tencent AI Lab at Seattle Area was established in May 2017. The lab strives to continuously improve AI's capability in perception, cognition, and creativity. Researchers there aim at solving challenging real-world problems with advanced technologies and publish extensively at top conferences and journals. Research Internship: Multimodal LLM (Speech/Music/Audio/Vision/Language) Tencent AI Lab is dedicated to advancing cutting-edge AI technologies, with a particular focus on innovative breakthroughs in large foundation models. The lab's long-term ambition is to drive the development of Artificial General Intelligence (AGI), and ultimately, Artificial Superintelligence (ASI). We are seeking research interns who are interested in developing novel speech/music/audio/vision/language processing techniques and large multimodal models for our Seattle area office located at Bellevue WA for the year 2026. Every research intern will work with researchers on a research project aimed at attacking one of the core problems by inventing cutting edge techniques. We encourage discussions and collaborations between researchers and interns. Interns are also encouraged to publish the results from the internship. Our projects span a wide range of areas, including developing more effective multimodal pretraining and post-training strategies for audio, speech, music, image, and video understanding and generation. We aim to enable fully duplex conversations, design more efficient large-model architectures, enhance multimodal memory and reasoning capabilities, and advance novel audio, speech, music, image, and video processing techniques—such as encoding, tokenization, and representation learning—with a focus on multimodal applications and end-to-end large models. Who We Look For
Stand Out From the Crowd
Upload your resume and get instant feedback on how well it matches this job.
Career Level
Intern
Education Level
Ph.D. or professional degree
Number of Employees
5,001-10,000 employees