Machine Learning Scientist

Rime Labs
Remote

About The Position

Rime builds voice AI for enterprises running customer experiences at scale. Our text-to-speech models are purpose-built for high-volume conversational deployments, engineered for the pronunciation accuracy, latency, and deployment flexibility that production environments actually demand. We started from a different premise than the rest of the field: voice AI isn't bottlenecked by model architecture. It's bottlenecked by data. So before we trained a single model, we built our own corpus: full-duplex, studio-quality conversational speech, recorded and annotated by PhD linguists. That's our moat. It's also why enterprises pick Rime when pilots need to convert into production. We're backed by top-tier investors including Unusual Ventures, and we've built a team at the intersection of product, research, and craft. Building voice models is an art. We intend to master it. We're hiring a Machine Learning Scientist to push the frontier of speech synthesis and speech understanding at Rime.

Requirements

  • Deep familiarity with the speech synthesis literature, contemporary and historical — Tacotron, FastSpeech, VITS, VALL-E, the codec-LM lineage. Opinions on what worked and why.
  • Hands-on training with neural codecs (EnCodec, DAC, Mimi, etc.) and multiple representation choices.
  • Experience with full- or half-duplex multi-modal modeling (Moshi, LLaMA-Omni, streaming S2S).
  • Strong attention to detail on data quality. You notice when an annotation pipeline is silently degrading or when an eval set has leakage.
  • Willing to roll up your sleeves on unglamorous data and training work — paired with the agency to build pipelines so the team isn't stuck doing it by hand.
  • Working knowledge of TTS frontend (G2P, normalization, prosody) and experience working with linguists.
  • Strong PyTorch fundamentals. Comfortable with training loops, distributed training, model internals.
  • PhD or equivalent research experience in speech, audio, ML, or computational linguistics or a track record that makes the credential irrelevant.

Nice To Haves

  • Multilingual TTS experience.
  • Background in prosody or paralinguistics.
  • Published work in speech, audio, or core ML venues.
  • Experience taking research models to production: quantization, distillation, streaming inference.

Responsibilities

  • Design, train, and evaluate speech synthesis models, autoregressive and non-autoregressive.
  • Drive research on full-duplex and half-duplex multi-modal architectures, including unified S2S systems.
  • Choose and iterate on speech representations: neural codecs, semantic tokens, mel features, continuous latents.
  • Build rigorous evaluation, objective and perceptual. Hold the bar on quality and prosodic control.
  • Collaborate with our linguists on TTS frontend behavior so modeling and frontend choices reinforce each other.

Benefits

  • Competitive base + meaningful early-stage equity
  • Remote-friendly
  • Visa sponsorship available
  • Access to a proprietary, full-duplex, studio-quality conversational speech corpus
  • Compute and tooling to do the work
  • Direct influence on the future of voice AI
© 2026 Teal Labs, Inc
Privacy PolicyTerms of Service