Research Staff, Voice AI Foundations

Jobgether

17h

About The Position

This role offers the opportunity to drive breakthrough research in voice AI, tackling fundamental challenges in speech understanding, synthesis, and large-scale audio modeling. You will work on next-generation neural audio codecs, generative voice models, and embedding systems that enable precise control over speaker, style, and environment. The position involves designing scalable data pipelines, model architectures, and efficient inference algorithms for world-scale voice applications. You will operate in a fast-paced, AI-first environment that values experimentation, creativity, and rapid iteration. This role is ideal for researchers passionate about bridging theory and practice, who thrive on solving unsolved problems and building technologies that can impact millions of users globally. Collaboration with cross-functional teams and contributions to open-source or foundational research will be integral to success.

Requirements

Strong foundation in statistical learning, self-supervised learning, and multimodal AI.
Deep expertise in foundation model architectures and experience scaling them across large datasets.
Demonstrated ability to bridge theoretical research with practical implementation.
Experience building data pipelines to process, curate, and maintain large, diverse audio datasets.
Track record of rigorous experimental design, controlled evaluations, and reproducible research.
Knowledge of hardware constraints and model optimization techniques for real-world deployment.
Research publications or open-source contributions in speech, language, or multimodal AI.
Excellent problem-solving skills, creativity, and ability to operate in a fast-changing environment.

Responsibilities

Pioneer development of latent space models to address scale, cost, and data challenges in voice AI.
Build and optimize neural audio codecs for low bit-rate compression and high-fidelity reconstruction across diverse datasets.
Develop steerable generative models to synthesize human speech across multiple contexts, speakers, and environments.
Create embedding systems to disentangle speaker, content, style, and environmental dimensions for precise control and dataset augmentation.
Design scalable model architectures, training strategies, and inference algorithms optimized for hardware efficiency.
Conduct rigorous experiments to validate new architectures and model innovations, leveraging large-scale audio datasets.
Collaborate with research and engineering teams to integrate findings into production-grade voice AI systems.

Benefits

Holistic health coverage including medical, dental, and vision plans.
Annual wellness and mental health stipends, life and disability insurance plans.
Flexible schedule with unlimited PTO, generous parental leave, and paid US holidays.
Stipends for home office setup and personal productivity.
401(k) plan with company match and tax savings programs.
Learning and education stipends, participation in conferences, AI workshops, and employee resource groups.
Global remote work options, with benefits administered according to local regulations for international employees.

Stand Out From the Crowd

Upload your resume and get instant feedback on how well it matches this job.

Upload and Match Resume