Measuring intelligence is hard, and humans haven't been particularly good at it. The proxies we've used — IQ, standardized tests, credentials — have shaped how we develop intelligence and how we value it, often in ways we later regret. AI gives us a chance to do better. The field is young enough that the methodologies for measuring what these systems can actually do are still being written, and the answers we settle on will shape what gets built, what gets deployed, and which workflows get automated next. Vals is building the measurement layer for the AI economy: the benchmarks, methodologies, and standards that determine which models ship and where they get trusted. We're hiring a Head of Research to lead it. The hard research questions don't have textbook answers yet. How do you measure whether an LLM can actually do a real lawyer's contract review, a real underwriter's risk assessment, a real radiologist's read? How do you build evaluations that hold up as models get better at gaming them? You'll be the person setting the direction on how Vals — and by extension, much of the field — answers them. Concretely, you'll: Advance the science of evaluation. The methodologies the field uses today — judge models, human-in-the-loop, static benchmarks — were built for a previous generation of models and break down on long-horizon, real-world tasks. You'll develop the new paradigms. Oversee Vals' broader research portfolio, setting direction across the projects already underway and the ones we haven't started yet. Publish work that moves the field forward. We want Vals' research to be cited, not just shipped. Recruit and grow a research team alongside the founders. Work directly with our enterprise customers and lab partners on the evaluation problems they actually have.
Stand Out From the Crowd
Upload your resume and get instant feedback on how well it matches this job.
Job Type
Full-time
Career Level
Senior
Education Level
Ph.D. or professional degree