Head Of Product - Model Evaluation

Applause

48d

About The Position

As Head of Product - Model Evaluation at Applause, you will lead the development of a strategic new AI evaluation platform—and play a foundational role in bringing it to market from the ground up. You will define the product vision, identify target customer personas, and shape the go-to-market strategy, while building the systems that measure, monitor, and improve AI models in production. As the business matures, you will act as the product leader. This position reports directly to the CTO. This is a rare opportunity to combine 0→1 product building with enterprise-scale impact. You’ll create a new category for Applause—extending our leadership in digital quality into AI—by developing capabilities such as LLM-as-a-judge systems, human-in-the-loop feedback pipelines, and model observability frameworks. You won’t just refine an existing product; you’ll define what this business becomes and how it wins in the market and then build the team that scales it.

Requirements

5+ years of product management experience, including work on AI/ML-driven or data products
Experience owning or contributing to 0→1 product development and go-to-market strategy
Demonstrated ability to take a product from concept to paying customers
Strong ability to define customer personas, value propositions, and product positioning
Experience working closely with data science or machine learning teams
Comfortable engaging at a technical level with ML engineers. You can discuss model architectures, evaluation metrics, and API design without needing a translator
Strong understanding of metrics, experimentation, and data quality
Experience with B2B/enterprise products and technical buyer personas

Nice To Haves

Background in data-science, machine learning, or engineering
MBA or equivalent experience
Experience with model evaluation systems, including human and/or automated approaches
Familiarity with LLM-as-a-judge, pairwise ranking, or preference modeling
Understanding of the limitations and tradeoffs in evaluating generative AI systems
Experience at AI evaluation/tooling companies (eg Arize, Langsmith, Galileo, Scale AI, Weights & Biases, or Humanloop) or as a buyer/implementer of such tools at an enterprise
MBA or equivalent experience in high growth, product led organizations

Responsibilities

Define the vision, positioning, and roadmap for Applause’s AI evaluation offering
Identify and develop target customer personas (e.g., AI platform teams, ML leaders, product orgs)
Design and execute the go-to-market strategy, including packaging, pricing, and initial sales motions
Partner with sales and marketing to validate demand, refine messaging, and drive early adoption
Translate market feedback into rapid product iteration and differentiation
Partner with data science and ML teams to design closed loop systems: model → evaluation → insight → improvement
Bring LLM-as-a-judge systems into production use for grading, ranking, and preference modeling
Partner with ML teams on iteration strategies (prompting, fine-tuning, data collection)
Ensure evaluation outputs translate into actionable improvements for customers
Leverage Applause’s global testing community to design scalable human evaluation pipelines
Define workflows, annotation schemas, and quality controls to produce gold data sets
Balance quality, cost, and latency to meet customer requirements
Act as the bridge between product, data science, engineering, and go-to-market teams
Translate complex technical capabilities into clear, differentiated product offerings
Drive alignment on priorities, success metrics, and execution plans