Product and Research Operations Manager

Cartesia•San Francisco, CA

47d•$160,000 - $190,000•Onsite

About The Position

Cartesia is seeking a Product and Research Operations Manager to design, scale, and operate its global scaled evaluation workforce. This role is crucial for ensuring model quality and customer outcomes, sitting at the intersection of product operations, data operations, and vendor management. The successful candidate will own the entire workforce system, including hiring pipelines, vendor strategy, workforce planning, quality control, and operational performance. This position involves translating product needs into operational workflows and collaborating with product, engineering, data, and customer-facing teams to support large-scale, real-world AI evaluation.

Requirements

5+ years in operations, workforce management, or data annotation systems.
Experience managing large contractor or vendor-based workforces.
Proven ability to scale operations from zero to production.
Systems thinking with the ability to design scalable operational frameworks.
Strong analytical skills with comfort around metrics like inter-rater reliability, precision, and throughput.
Ability to execute quickly under ambiguity with close attention to quality and edge cases.

Nice To Haves

Experience in AI/ML data operations or evaluation pipelines.
Background in audio, speech, or language-related workflows.
Familiarity with QA systems and annotation tooling.
Experience with marketplace platforms such as Upwork or Mercor.
Exposure to multilingual operations.

Responsibilities

Design and implement workforce structure across languages, skill tiers, and use cases, including evaluators, auditors, and leads for TTS products.
Build capacity models to support continuous eval pipelines and data production workflows.
Own relationships with vendors such as data annotation firms and contractor platforms, negotiating rate cards, SLAs, and throughput guarantees.
Decide on build, buy, or hybrid workforce models and continuously benchmark cost and performance across regions.
Design multi-layer QA systems spanning self-checks, peer review, audits, and gold tasks.
Define and track inter-rater reliability, error rates by category, and annotator-level performance distributions.
Build escalation and retraining workflows to maintain quality at scale.
Run day-to-day operations including task allocation, throughput tracking, and SLA adherence.
Build systems to reduce evaluator fatigue, rotate task types, and maintain consistency across large-scale evaluations.
Partner with tooling teams to improve evaluator UX and with data teams to ensure clean, structured outputs for model training.