We’re seeking a Senior LLM Evals Engineer to build the evaluation and verification layer for agentic, LLM systems acting in complex environments driving autonomous workflows. You’ll design eval suites, automated verifiers, and regression gates that measure real progress on long-horizon planning, agent execution, uncertainty retirement, and end-to-end build success. This role spans systems engineering, rigorous experimentation, and tight collaboration with LLM scientists, agent/toolchain engineers, and simulation teams.
Stand Out From the Crowd
Upload your resume and get instant feedback on how well it matches this job.
Job Type
Full-time
Career Level
Senior
Education Level
No Education Listed