PhD Research Intern - Multimodal AI (Fall 2026)

Dolby Laboratories, Inc.•Brisbane, CA

1d•Onsite

About The Position

Join the leader in entertainment innovation and help us design the future. The Advanced Technology Group (ATG) is the research division of the company. ATG’s mission is to look ahead, deliver insights, and innovate technological solutions that will fuel Dolby’s continued growth. As a valued member of the Dolby team, you’ll see and hear the results of your work everywhere, from movie theaters to smartphones. We continuously push the boundaries of audio, imaging, and cloud technology to create spectacular entertainment experiences. As a diverse and dynamic group, our ATG researchers work on cutting-edge projects related to computer science and electrical engineering for audio, video, and cloud technologies, exploring exciting domains such as AI/ML, algorithms, digital signal processing, audio processing, image processing, computer vision, AR/VR, data science & analytics, distributed systems, cloud, edge & mobile computing, computer networking, and IoT. The Multimodal Lab is looking for a talented, self-motivated PhD student to explore multimodal AI models for multimodal source separation, and spatial media content creation and generation. This is a research-focused role ideal for candidates passionate about pushing the boundaries of audio-visual AI at the intersection of deep learning, signal processing, and generative media.

Requirements

Currently enrolled in a PhD program in Computer Science, Electrical Engineering, Applied Mathematics, or a closely related field
Strong background in deep learning with proven ability of applying it to multimedia research challenges
Deep familiarity with leading AI model paradigms, including large language models (LLMs) and generative models (e.g., diffusion, VAE, GAN)
Proficiency in Python and at least one deep learning framework
Strong mathematical foundation
Excellent written and verbal communication skills

Responsibilities

Develop and apply multimodal AI architectures that integrate audio, visual, and/or language modalities for joint understanding and generation
Design, implement and train advanced multimodal AI models for spatial media content creation and generation, including multimodal source separation and localization
Prepare and curate high-quality datasets through data augmentation and synthetic data generation
Evaluate proposed models against state-of-the-art research benchmarks.
Prototype and validate the developed algorithms in realistic use cases
Present research findings and contribute to patent applications and scientific publications.

Stand Out From the Crowd

Upload your resume and get instant feedback on how well it matches this job.

Upload and Match Resume