CV0267: Internship - Audio-Visual Learning for Spatial Audio Processing

Mitsubishi•Cambridge, MA

76d•$6,000 - $8,000

About The Position

MERL is looking for a highly motivated intern to work on an original research project on audio-visual learning, with a focus on spatial audio, training models using limited labeled data. A strong background in computer vision, audio processing, and deep learning is required. Experience in audio-visual (multimodal) learning, weakly/self-supervised learning, Room Impulse Response (RIR) estimation, and large (vision-) language models is an added plus and will be valued. The successful candidate is expected to have published at least one paper in a top-tier computer vision or machine learning venue, such as CVPR, ECCV, ICCV, ICML, ICLR, NeurIPS, or AAAI, and possess solid programming skills in Python and popular deep learning frameworks such as Pytorch. The intern will collaborate with MERL researchers to develop and implement novel algorithms and prepare manuscripts for scientific publications. Successful applicants are typically graduate students on a Ph.D. track or recent Ph.D. graduates. Duration and start date are flexible, but the internship is expected to last for at least 3 months.

Requirements

Prior publications in top-tier computer vision and/or machine learning venues, such as CVPR, ECCV, ICCV, ICML, ICLR, NeurIPS, or AAAI.
Knowledge of the latest self-supervised and weakly-supervised learning techniques.
Experience with Large (Vision-) Language Models, Spatial audio processing techniques.
Proficiency in scripting languages, such as Python, and deep learning frameworks such as PyTorch or Tensorflow.