Research Scientist Intern, Audio Quality with AI (PhD)

MetaRedmond, WA
$7,650 - $12,134

About The Position

The Meta Reality Labs Research Team is seeking an intern passionate about speech perception and audio quality to investigate why processed speech sometimes sounds degraded or robotic. The project focuses on identifying systematic phonemic errors as causal factors in perceived quality degradation, and linking these errors to human quality and intelligibility judgments. A core method is to explore the capabilities of audio vs video LLMs. This is fundamentally a speech-perception research role; multimodal/LLM methods are a supporting tool rather than the central focus. Our internships are twelve (12) to twenty four (24) weeks long and we have various start dates throughout the year.

Requirements

  • Currently have, or is in the process of obtaining, a PhD degree in the field of Speech and Hearing Science, Auditory Neuroscience, Computational Neuroscience, Computer Science, Artificial Intelligence, Generative AI, Transformer Models, Machine Learning, Signal Processing or Computer vision
  • 3+ years experience with Python, Matlab, or similar
  • 3+ years experience with machine learning software platforms such as PyTorch, TensorFlow, etc
  • Background in speech perception, psychoacoustics, or acoustic phonetics
  • Experience deploying novel audio computational models and LLMs
  • Must obtain work authorization in the country of employment at the time of hire, and maintain ongoing work authorization during employment

Nice To Haves

  • Experience building novel audio computational models and LLMs
  • Demonstrated software engineer experience via an internship, work experience, coding competitions, or widely used contributions in open source repositories (e.g. Github)
  • Experience in advancing AI techniques, including core contributions to open source libraries and frameworks in computer vision or audio processing
  • Experience with audio and speech quality assessment
  • Experience with multichannel audio processing
  • Experience in visual and acoustic scene analysis
  • Experience manipulating and analyzing complex, large scale, high-dimensionality data from varying sources
  • Proven track record of achieving significant results as demonstrated by grants, fellowships, patents, as well as first-authored publications at leading workshops or top computer vision and machine learning conferences such as ARO, ASA, NeurIPS, ICML, ICLR, ACL, EMNLP, CVPR, ICCV, ECCV, ICASSP, InterSpeech or similar
  • Experience in utilizing theoretical and empirical research to solve problems
  • Experience working and communicating cross functionally in a team environment
  • Intent to return to a degree-program after the completion of the internship/co-op

Responsibilities

  • Investigate systematic phonemic errors as causal factors in perceived speech quality degradation, and link them to human perceptual judgments
  • Build and curate datasets and benchmarks of speech for phoneme-level analysis
  • Explore and compare the capabilities of audio and video (multimodal) LLMs as tools to support this analysis
  • Relate findings to human perceptual data (quality preference and intelligibility) and translate them into actionable insights for research and engineering teams
  • Where appropriate, adapt multimodal models to the task in a supporting capacity
  • Collaborate with researchers, engineers, and cross-functional partners to define goals, communicate findings, and drive improvements in speech quality
  • Develop tools and infrastructure to streamline and scale the analysis
  • Stay current with research in speech perception and audio quality and intelligibility assessment, and incorporate best practices into Meta's workflows
  • Disseminate results through internal reports and presentations, and, when appropriate, external publications

Benefits

  • $7,650/month to $12,134/month + benefits
© 2026 Teal Labs, Inc
Privacy PolicyTerms of Service