Dolby Careers PhD Research Intern - Multimodal AI, Audio (Fall 2026, Atlanta)

Sound, Visual, & Display Technology•Atlanta, GA

2d•Onsite

About The Position

Join the leader in entertainment innovation and help us design the future. The Advanced Technology Group (ATG) is the research division of the company. ATG’s mission is to look ahead, deliver insights, and innovate technological solutions that will fuel Dolby’s continued growth. As a valued member of the Dolby team, you’ll see and hear the results of your work everywhere, from movie theaters to smartphones. We continuously push the boundaries of audio, imaging, and cloud technology to create spectacular entertainment experiences. As a diverse and dynamic group, our ATG researchers work on cutting-edge projects related to computer science and electrical engineering for audio, video, and cloud technologies, exploring exciting domains such as AI/ML, algorithms, digital signal processing, audio processing, image processing, computer vision, AR/VR, data science & analytics, distributed systems, cloud, edge & mobile computing, computer networking, and IoT. The Dolby U internship program offers impactful, project-based work experience in a collaborative, creative environment where you work with industry leaders. It is a great way to get exposure to cool technologies, hone your skills, and build a pathway toward full-time opportunities with Dolby. Dolby Laboratories is looking for a self-motivated, talented individual interested in applying their theoretical and practical expertise to the development of new technologies. The position is in the Multimodal Processing Team – within the Advanced Technology Group of Dolby Laboratories. It will involve working in close collaboration with other research scientists/engineers/AI researchers in multiple locations.

Requirements

Working towards a Ph.D. degree in Artificial Intelligence, Electrical Engineering, Computer Science, or related field.
Experience developing and training deep learning architectures.
Experience working with deep learning architecture for audio and/or video applications.
First-author publications at top-tier peer-reviewed AI conferences (CVPR, ICCV, ECCV, NeurIPS, ICML, ICLR, InterSpeech, ICASSP, etc.).
Programming experience in Python, and experience working with frameworks like PyTorch or TensorFlow.
Ability to prototype quickly, with adept critical thinking skills.
Excellent communication skills and a team-oriented work ethic.
Working towards a Ph.D. degree in Computer Science, Electrical Engineering, or a related field; recent grads within six months of graduation are also eligible to apply.
Must be available to work full-time, Monday to Friday, for 12 weeks between September 2026 – December 2026.

Nice To Haves

Multimodal machine learning and deep learning.
Multimodal generative AI.
Audio and/or Video quality evaluation.
Audio-visual content analysis and enhancement.
Multimodal representation learning.

Responsibilities

Develop innovative AI algorithms designed to process audio and/or video data for applications in audio-visual quality evaluation, audio-visual content analysis, and multimodal representations.
Contribute to the definition of new formats and evaluation pipelines for generative AI media.