Nuance Labs is building photorealistic, real-time AI avatars with emotional intelligence: a full-duplex audiovisual system that can listen, speak, react, interrupt, and respond like a real person. This posting is aimed at researchers who are completing — or have recently completed — a PhD and want to do their best work at a fast-moving frontier lab. This role is broader than a traditional RL algorithm role. You’ll be expected to understand modern post-training methods and help build the infrastructure needed to run them at scale. The work spans RL method development, rollout generation, reward modeling, policy optimization, evaluation, data feedback loops, serving, observability, and distributed execution. You’ll help build Nuance’s RL/post-training stack from 0→1 and scale it from 1→10. That means turning rapidly evolving research ideas into reliable training systems: defining the abstractions, choosing or modifying frameworks, wiring together rollout workers and trainers, building reward/evaluation loops, debugging failure modes, and making the system fast enough for researchers to iterate. For Nuance, post-training is not limited to text. Our models are omni from the ground up: audio, video, language, and real-time full-duplex interaction. We need RL and post-training methods that improve interactive behavior, timing, interruption, emotional response, audiovisual coherence, and real-time conversational quality. This is a high-ownership role with direct impact on how Nuance models improve after pretraining — and a place to grow fast alongside people who’ve built these systems before.
Stand Out From the Crowd
Upload your resume and get instant feedback on how well it matches this job.
Job Type
Full-time
Career Level
Entry Level
Education Level
Ph.D. or professional degree