Human Evaluation - Program Manager

NetflixNew York, NY

About The Position

Netflix is building toward more intelligent and responsive systems, and thoughtful, high-quality evaluation is essential to ensuring progress in the right direction. This role involves joining a team that creates the frameworks, tools, and workflows to ensure human judgment is applied with consistency, clarity, and care across various evaluation criteria like helpfulness, tone, safety, relevance, or creative quality. The position will shape how human and AI-driven evaluations are designed and will own the day-to-day execution of these efforts, from scoping and planning to rater onboarding and calibration. The role also involves acting as a thought partner and influencer to align stakeholders, introduce new ways of working, and establish a shared language around quality. The work will ensure AI features are high-performing and aligned with Netflix's values, users, and brand integrity. The role operates within a small team focused on rigorous, aligned, and effectively resourced evaluation designs executed at scale.

Requirements

  • 4+ years of experience working in human evaluations, data collection, labeling, or annotation operations in GenAI/ML environments
  • Track record of implementing process improvements or quality control systems for data collection needs
  • Prior experience managing human annotation vendors, raters, or data labeling teams
  • Strong understanding of evaluation design, including guidelines, rubrics, and scoring protocols
  • Proven ability in end-to-end management of complex, cross-functional programs, demonstrating strong Program Management skills and clear accountability for successful delivery.
  • Experience with human labeling platforms
  • Excellent written and verbal communication skills
  • Ability to synthesize feedback into clear recommendations and process improvements
  • Familiarity with responsible AI principles and how to embed them into evaluation design
  • Strong organizational skills and executional focus; ability to track details while seeing the bigger picture

Responsibilities

  • Lead end-to-end execution of human evaluation and data operations initiatives—from intake and scoping to delivery
  • Develop and operationalize frameworks for evaluating GenAI and ML outputs
  • Collaborate across research, product, UX, and engineering to embed evaluation into model development cycles
  • Build and maintain project timelines, proactively manage blockers, and ensure timely execution
  • Develop clear, scalable guidelines and scoring rubrics to ensure consistent rater judgment
  • Oversee rater onboarding, calibration, and QA workflows
  • Define and monitor success metrics such as speed to IRR, throughput, and task effectiveness
  • Pilot and refine evaluation tasks to improve clarity, inter-rater reliability, and feedback quality
  • Build foundational documentation and drive adoption of best practices across teams
  • Track evaluation health and proactively communicate progress to stakeholders clearly and proactively
  • Anticipate and proactively resolve bottlenecks and blockers
  • Act as the connective tissue across multiple partners to ensure alignment and effective execution of evaluations at scale

Benefits

  • Health Plans
  • Mental Health support
  • 401(k) Retirement Plan with employer match
  • Stock Option Program
  • Disability Programs
  • Health Savings and Flexible Spending Accounts
  • Family-forming benefits
  • Life and Serious Injury Benefits
  • paid leave of absence programs
  • flexible time off

Stand Out From the Crowd

Upload your resume and get instant feedback on how well it matches this job.

Upload and Match Resume

What This Job Offers

Job Type

Full-time

Career Level

Manager

Education Level

No Education Listed

Number of Employees

5,001-10,000 employees

© 2026 Teal Labs, Inc
Privacy PolicyTerms of Service