We're looking for Research Engineers to build the evaluations that tell us — and the world — what Claude can actually do. Your work will turn ambiguous notions of "intelligence" into clear, defensible metrics that researchers, leadership, and the public can rely on. You'll design and implement evaluations across the full spectrum of Claude's capabilities and personality, and build the infrastructure that runs them reliably at scale. You'll partner closely with researchers throughout the lifecycle of a new capability — from defining what to measure, to running the eval against live training checkpoints, to interpreting the results. The goal is to make Anthropic the leader in extremely well-characterized AI systems, with performance that is exhaustively measured and validated across the tasks that matter.
Stand Out From the Crowd
Upload your resume and get instant feedback on how well it matches this job.
Job Type
Full-time
Career Level
Senior