(Senior) Research Engineer / Research Scientist

METR•Berkeley, CA

1d•$250,000 - $450,000•Hybrid

About The Position

We are offering a $21k referral bonus for this role. You can refer people through our form , and it lists the terms of this bonus. About METR We are a nonprofit research organization that develops scientific methods to assess AI capabilities, risks and mitigations, with a specific focus on threats related to autonomy, AI R&D automation, and alignment. Our work advances the science of AI measurement by understanding frontier AI systems' ability to complete complex tasks without human input, and directly executing those measurements to inform risk assessments and consensus within the AI industry, among policymakers, and the public. Our work has been cited by NIST, a previous US President, the UK Government, Nature, The New York Times, and Time Magazine. Through our work with leading AI labs, governments, and academia, we ensure that our insights can quickly be leveraged to promote the safe development of increasingly powerful AI systems. We believe it is robustly good for civilization to have a clear understanding of what types of danger AI systems pose and how high the risk is, and we are extremely excited to find ambitious, excellent people to join our team and tackle one of the most important challenges of our time. What We're Looking For We're seeking a researcher to help us better understand AI capabilities. Previous work in this vein includes agent time horizons , a commonly-used metric for measuring AI progress, and RCTs on open-source developer productivity . We're excited about candidates from a wide-range of backgrounds. If you're scrappy, smart, and driven to better understand model capabilities, please apply - we're excited to chat with researchers, engineers, and startup-founders alike.

Requirements

You can write code. At the very least, you should be able to quickly write a write a data analysis script in Python to answer an important question. Bonus points if you can write a clean PR too.
You're excited to get your hands dirty. METR researchers often interact with LLMs in a wide variety of scenarios, read lots of agent transcripts, and closely review human outputs (e.g. video recordings of developers in our productivity RCT).
You are undaunted by open-ended mandates. You can take a confusing or ill-posed question and produce insightful and helpful frameworks/proposals/results.
You should be able to read, understand, and critique a research proposal.
You're able to understand how particular projects fit into METR's overall mission.
You're a good written communicator. Bonus points if you can write a great paper.
You work fast and are highly reliable.

Nice To Haves

Bonus points if you can write a clean PR too.
Bonus points if you can write a great paper.

Responsibilities

Lead a project investigating transcripts as a source of evidence about agent capabilities.
Create metrics that speak to the degree of uplift AI agents provide, and collect these metrics from AI R&D-relevant companies.
Improve METR's time-horizon metric ("Moore's law for AI agents") to make it more externally valid, more interpretable, and more predictive on threat-model relevant capabilities.
Improve this metric to be the single most useful source of evidence for interpreting the rate of AI progress.
Design and build experiments testing agent capabilities in the wild.
Create a new source of evidence for us to better triangulate agent capabilities and limitations.
Lead large-scale human-subjects experiments measuring the impacts of AI agents on economically-valuable R&D.

Stand Out From the Crowd

Upload your resume and get instant feedback on how well it matches this job.

Upload and Match Resume