Machine Learning Engineer, Data Quality

Rime Labs•San Francisco Bay Area, CA

2d•Remote

About The Position

Rime builds voice AI for enterprises running customer experiences at scale. Our text-to-speech models are purpose-built for high-volume conversational deployments, engineered for the pronunciation accuracy, latency, and deployment flexibility that production environments actually demand. We started from a different premise than the rest of the field: voice AI isn't bottlenecked by model architecture. It's bottlenecked by data. So before we trained a single model, we built our own corpus: full-duplex, studio-quality conversational speech, recorded and annotated by PhD linguists. That's our moat. It's also why enterprises pick Rime when pilots need to convert into production. We're backed by top-tier investors including Unusual Ventures, and we've built a team at the intersection of product, research, and craft. Building voice models is an art. We intend to master it. The path is the craft itself: the loop between theory and practice — the shared mental model of how things should behave, met by the reality that doesn't quite conform, sharpened by the meeting. Role Overview We're hiring a Machine Learning Engineer, Data Quality to own the operational data pipeline that produces our training corpus end-to-end — and to bring a vision for where it should go next. We take that seriously: if you can plan an overhaul, justify it, and orchestrate the human and machine migration work, we'll do it together. This is a sociotechnical role. You'll be in the loop on everything and talking to everyone that touches the data across 42+ languages: 50+ annotators, 32+ external vendors and an in-house recording studio, and the systems behind them — ingestion, quality assurance, pre-processing, cataloging, export to training. At any given moment, dozens of deliverables are in flight, each on its own clock. The people who thrive here want to listen to the audio clips and design the system that scales their judgment to the next million. You don't need deep expertise across the whole stack on day one — you need the judgment to know what good looks like at each stage, and the engineering depth to build (or learn to build) the parts that need building.

Requirements

Instinct for data quality. You can tell good data from bad. You know what "bad" looks like in this specific domain — not just generic "anomalies," but the particular ways audio and transcripts go wrong.
Willing to look at the data. Open the file. Listen to the clip. Read the transcript. You don't outsource the first-pass checks to a script.
Opinionated, and curious when challenged. You arrive with a perspective informed by what you've seen work and what you've seen fail — and you're equally interested in pressure-testing it. A "what about..." question isn't a threat; it's where the work happens.
Project sense. You can hold a lot of moving parts in your head — what's in flight, what's blocked, what's about to slip — and keep the picture clear enough that others can step into it.
Designs, doesn't just execute. You want to take on more design responsibility over time, not less. You're looking for a role where you (co-)own things end-to-end, not one where someone hands you tasks to implement.
Comfort being out of your depth at the boundary. You'll sometimes debug code you didn't write in tools you don't use daily. You should find this energizing, not threatening.
Solid software and data engineering fundamentals. Python, schemas you can reason about, production data pipelines you've built and operated on cloud-native infrastructure.

Nice To Haves

Audio pipeline tooling : ffmpeg, Silero VAD, faster-whisper, neural audio codecs (Encodec, SNAC, SoundStream).
TTS frontend work : G2P (phonemizer, g2p-en), text normalization (NeMo TN or equivalent), prosody and phoneme alignment.
Annotation platforms : Label Studio, Argilla, or equivalent — particularly customizing or replacing them.
Direct experience with our stack : GCP (Cloud Run, Cloud Batch, GCS, Pub/Sub), Supabase / Postgres. AWS or Azure experience maps fine.

Responsibilities

Linguist- and annotation-team-facing tooling : annotation UI, PM workflow for project management, QC dashboards.
Vendor data QA workflows : A large share of incoming data arrives from vendors in various states and needs to pass QA before it can be trusted. The tooling, routing, and tracking for that work is yours.
Quality systems across the network : The signals, dashboards, and review loops that surface when a corner of the network is drifting — a vendor's transcripts getting sloppy, an annotator's IAA slipping, a language's gold set going stale — before it lands in the training pool.
End-to-end audio annotation pipeline : Currently some stages exist as prototypes; productionizing and rebuilding them is work that's currently in flight.
Dataset versioning and experimenter tooling : the model team will want to subset the vetted pool ("speakers X/Y/Z, duration 3–12s, quality > 0.8") into reproducible training manifests. The query interface, manifest format, and lineage tracking are all yours.
Pipelines for full- and half-duplex training data

Benefits

Meaningful equity upside.
High ownership, high standards, low bureaucracy.
Competitive base + meaningful early-stage equity
Remote-friendly
Visa sponsorship available
Access to a proprietary, full-duplex, studio-quality conversational speech corpus
Compute and tooling to do the work
Direct influence on the future of voice AI

Stand Out From the Crowd

Upload your resume and get instant feedback on how well it matches this job.

Upload and Match Resume