In this role, you will work on projects that improve and evaluate large language models by crafting challenging, competition-level mathematics problems and rigorously assessing model reasoning. The ideal candidate has a strong foundation in competitive mathematics at the AIME, HMMT, and IMO (Olympiad) level across the four classic pillars: Algebra, Number Theory, Combinatorics, and Geometry. You should be able to design novel, "Google-proof" problems intended to expose deep reasoning deficiencies in state-of-the-art models, and to diagnose precisely where and why a model's reasoning breaks down. The role combines original problem authoring, rigorous solution writing, and detailed evaluation of model-generated responses. This is your chance to future-proof your career in an AI-first world by working at the frontier of mathematical reasoning evaluation.
Stand Out From the Crowd
Upload your resume and get instant feedback on how well it matches this job.
Job Type
Full-time
Career Level
Entry Level
Education Level
No Education Listed