Machine Learning Engineer, Data

Rime Labs•San Francisco, CA

1d•Remote

About The Position

Rime builds voice AI for enterprises running customer experiences at scale. Our text-to-speech models are purpose-built for high-volume conversational deployments, engineered for the pronunciation accuracy, latency, and deployment flexibility that production environments actually demand. We started from a different premise than the rest of the field: voice AI isn't bottlenecked by model architecture. It's bottlenecked by data. So before we trained a single model, we built our own corpus: full-duplex, studio-quality conversational speech, recorded and annotated by PhD linguists. That's our moat. It's also why enterprises pick Rime when pilots need to convert into production. We're backed by top-tier investors including Unusual Ventures, and we've built a team at the intersection of product, research, and craft. Building voice models is an art. We intend to master it.

Requirements

Strong software engineering fundamentals: Python, distributed systems, comfort across the stack.
Database design fluency: you reach for the right schema and have operated Postgres or similar in production.
Production data pipelines on cloud-native infrastructure (GCP preferred). Our data stack is currently GCP-dominant.
Operational comfort: containers, CI/CD, IAM, cost-aware infrastructure choices, etc.
Strong attention to detail on data quality.
Comfort being out of your depth at the boundary. You'll sometimes debug code you didn't write in tools you don't use daily. You should find this energizing, not threatening.
Bias toward building the abstractions so the modeling team doesn't stay stuck doing data work by hand.

Nice To Haves

Multilingual data pipeline experience.
Audio DSP, signal processing, or speech recognition background.
Large-scale training infra (FSDP, DeepSpeed, Ray).
Annotation tooling and human-in-the-loop systems.
Comfort working close to research teams.

Responsibilities

End-to-end audio annotation pipeline: Currently some stages exist as prototypes; productionizing and rebuilding them is work that’s currently in flight.
Quality systems: Automated tooling to catch annotation errors, alignment drift, and silent regressions before training runs.
Dataset versioning and experimenter tooling: the model team will want to subset the vetted pool ("speakers X/Y/Z, duration 3–12s, quality > 0.8") into reproducible training manifests. The query interface, manifest format, and lineage tracking are all yours.
Linguist- and annotation-team-facing tooling: annotation UI, PM workflow for project management, QC dashboards.
Pipelines for full- and half-duplex training data

Benefits

Meaningful equity upside.
Competitive base + meaningful early-stage equity
Remote-friendly
Visa sponsorship available
Access to a proprietary, full-duplex, studio-quality conversational speech corpus
Compute and tooling to do the work

Stand Out From the Crowd

Upload your resume and get instant feedback on how well it matches this job.

Upload and Match Resume