Machine Learning Engineer, Model Evaluations (Speech LLM) - San Francisco

Plaud•San Francisco, CA

14h•$180,000 - $270,000•Hybrid

About The Position

Plaud is building the world's most trusted AI work companion for professionals to elevate productivity and performance through note-taking solutions, loved by over 1,500,000 users worldwide since 2023. With a mission to amplify human intelligence, Plaud is building the next-generation intelligence infrastructure and interfaces to capture, extract, and utilize what you say, hear, see, and think. Plaud Inc. is a Delaware-incorporated, San Francisco-based company pushing the boundary of human–AI intelligence through a hardware–software combination. With SOC 2, HIPAA, GDPR, ISO27001, ISO27701, and EN18031 compliance, Plaud is committed to the highest standards of data security and privacy protection.

Requirements

Strong software engineering skills (especially in Python)
Experience building reliable distributed systems, data pipelines, or evaluation harnesses that can run at scale against live model checkpoints.
Ability to partner with ML researchers to define measurable benchmarks for Speech LLMs.
Experience building and owning dashboards that track model health during training.
Ability to rapidly debug anomalous mid-training results.
Ability to communicate complex statistical results and model behaviors clearly to both technical and non-technical stakeholders.

Nice To Haves

Deep familiarity with both traditional (WER, CER, PESQ, etc) and modern audio evaluation frameworks (automated MOS scoring).
Experience using frontier models or finetuning multi-modal LLMs to evaluate the conversational logic, transcription accuracy, audio quality, and reasoning of audio models.
Experience managing large-scale crowdsourcing operations or preference data collection to support RLHF/DPO efforts.
A strong background in statistics and experimental design, paired with experience building trusted tracking dashboards (e.g., Weights & Biases, MLflow).
Experience curating complex datasets to test edge cases, such as heavy accents, overlapping speech, or highly noisy acoustic environments.

Responsibilities

Have a passion for turning ambiguous, subjective concepts like a voice's naturalness, expressiveness, or conversational cadence into clear, defensible, and automated metrics that researchers and leadership can rely on.
Possess strong software engineering skills (especially in Python) and have experience building reliable distributed systems, data pipelines, or evaluation harnesses that can run at scale against live model checkpoints.
Can deeply partner with ML researchers to define exactly what "good" looks like for a Speech LLM, translating capabilities (like ASR robustness in noisy environments or TTS emotional steerability) into measurable benchmarks.
Are comfortable building and owning dashboards that track model health during training, improving signal-to-noise ratios, reducing evaluation latency, and making performance regressions impossible to miss.
Rapidly debug anomalous mid-training results to determine if a drop in performance stems from the model architecture, corrupted data, or infrastructure.
Communicate complex statistical results and model behaviors clearly to both technical and non-technical stakeholders.

Benefits

Competitive Compensation: $180K - $270K base salary + performance bonus + Equity.
Comprehensive Benefits: Top-tier healthcare for employees and dependents, including dental and vision, and a generous employer subsidy.
Retirement Planning: 401(k) plan for full-time employees with company matching.
Paid Time Off: Unlimited PTO, plus 13 paid holidays.
New Parent Leave: 12 weeks of paid time off to spend time with your new family, regardless of gender.
Hybrid Office: Minimum of 3x in-office per week to foster highly collaborative, fast-paced research.
Gear & Perks: Choice of top-of-the-line laptops/workstations, annual offsites, and a fully stocked office.

Stand Out From the Crowd

Upload your resume and get instant feedback on how well it matches this job.

Upload and Match Resume