We’re hiring an Eval Engineer to design and run creative evaluations of new AI capabilities. Your job is to turn emerging AI ideas into measurable experiments and publish the results for the developer ecosystem. When new models, agents, or frameworks appear, everyone has opinions about what works but few people actually test them. This role exists to change that. You’ll design experiments that compare models, prompts, and agent architectures against real tasks. You’ll build the datasets, scoring logic, and evaluation harnesses. Then you’ll publish the results so builders understand what actually works. This role sits at the intersection of engineering, experimentation, and technical storytelling.
Stand Out From the Crowd
Upload your resume and get instant feedback on how well it matches this job.
Job Type
Full-time
Career Level
Mid Level
Education Level
No Education Listed
Number of Employees
101-250 employees