Technical Program Manager, Human Evaluation Operations

MicrosoftRedmond, WA
1d$100,600 - $199,000

About The Position

Microsoft AI (MAI) is building the world’s most advanced AI systems—and rigorous, scalable human evaluation is foundational to ensuring our models are safe, aligned, and high‑quality. The Human Evaluation Operations (Human Eval Ops) team powers this by running one of the largest and most reliable human‑in‑the‑loop pipelines at Microsoft. We are hiring two Technical Program Managers to join this team and own end‑to‑end evaluation operations for model quality, safety, and capability development. These TPMs will partner closely with product squads, engineering, data scientists, researchers, and external annotation vendors to deliver high‑quality human evaluations at scale. You will drive programs that ensure MAI has the people, processes, training pipelines, and tooling needed to enable fast, trustworthy, and efficient evaluation across a wide range of AI tasks. This is a highly cross‑functional, execution‑oriented TPM role ideal for someone who thrives in operational complexity, is deeply organized, and loves working at the intersection of people, process, and product quality.

Requirements

  • Bachelor's Degree AND 2+ years experience in engineering, product/technical program management, data analysis, or product development OR equivalent experience.
  • 1+ year(s) of experience managing cross-functional and/or cross-team projects.

Nice To Haves

  • 3+ years of technical program management, operations management, data operations, or equivalent experience
  • 1+ year(s) of experience reading and/or writing code (e.g., sample documentation, product demos).
  • Experience working cross‑functionally with engineering, research, PM, vendors, and operations partners.
  • Experience managing vendor relations or external workforce programs.
  • Strong analytical skills and comfort working with dashboards, metrics, and evaluation data.
  • Experience running human‑in‑the‑loop data pipelines (e.g., annotations, RLHF, safety evals, quality assurance, crowdsourcing).
  • Familiarity with LLM and AI model evaluation practices, data annotation platforms and systems.
  • Ability to quickly understand product quality signals, debug task design issues, and iterate with engineering teams.
  • Experience in operational excellence, process automation, or scaling manual workflows.

Responsibilities

  • Lead Human Evaluation Programs: Drive end‑to‑end human evaluation workflows supporting model quality, safety, and capability initiatives across MAI. Coordinate evaluation planning, task design alignment, and delivery with product squads, engineering, and research partners.
  • Manage Evaluation Workforce & Readiness: Oversee the health, performance, and scalability of MAI’s human evaluation workforce—including onboarding, qualification, training, and continuous performance management—to ensure reliable, high‑quality evaluation signals.
  • Operational Excellence & Quality Governance: Maintain high operational standards across human‑in‑the‑loop pipelines by monitoring quality signals, resolving issues, and guiding teams toward consistent, trustworthy evaluation outcomes.
  • Cross‑Functional Program Leadership: Partner with product squads to scope evaluation needs, define instructions and scorecards, support experimentation, and ensure teams are equipped to use human evaluations effectively.
  • Platform & Vendor Partnership Management: Represent MAI needs to platform and vendor partners, shaping their roadmaps and ensuring capacity, reliability, and compliance with MAI standards.
  • Insights, Tooling, & Documentation: Provide evaluation insights to product teams, maintain essential documentation, and influence the evolution of internal tools, dashboards, and processes that enable scalable human evaluations.
  • Specialized Evaluation Programs: Support domain‑specific or advanced evaluation initiatives (e.g., expert reviews, structured scoring programs) in collaboration with MAI stakeholders.
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service