Member of Technical Staff, Research

METR•Berkeley, CA

64d•$250,000 - $450,000•Hybrid

About The Position

We are a nonprofit research organization that develops scientific methods to assess AI capabilities, risks and mitigations, with a specific focus on threats related to autonomy, AI R&D automation, and alignment. We believe it is robustly good for civilization to have a clearer understanding of what dangers AI systems pose, and we are extremely excited to find ambitious, excellent people to join our team and tackle one of the most important challenges of our time. We evaluate candidates primarily through work tests. We usually do an in-person trial as well but can be flexible about this. METR currently has 3 primary research streams: Capabilities: Accurately measuring frontier model performance on threat-relevant tasks (autonomy, AI R&D automation, etc.) and predicting future capabilities. We develop and maintain benchmarks, diverse evidence-gathering methods, and metrics to track capability trends and anticipate the thresholds that matter most for safety. Monitorability: Understanding how well frontier models can take subversive or unwanted actions despite various monitoring or control protocols. We build the research infrastructure â novel metrics, control evaluations, elicitation methods â needed to improve the world's understanding of how effectively current and future models can circumvent oversight. Alignment/Propensity: Determining whether or not a model that is capable of causing catastrophic harm (in its actual deployment setting) would be likely to actually do so in a given high-stakes deployment setting. We aim to develop the science of propensity evaluations and examine when we might expect high-stakes catastrophic misalignment. The Capabilities and Monitorability streams are both hiring Research ICs (individual contributors), while the Alignment/Propensity stream is hiring for a Research Stream Lead, followed by Research ICs down the line. The stream you end up joining will be based on a combination of working fit and interest. For our Research IC roles, we are looking for a combination of skills across âresearch scienceâ, âresearch executionâ and software engineering. You may not have all of these skills (for example, we donât expect software engineering to be a large part of the role for narrowly focused researchers). For the Research Stream Lead role, we are additionally looking for research management skills. We're seeking a researcher to help us better understand AI capabilities. Previous work in this vein includes agent time horizons, a commonly-used metric for measuring AI progress, and RCTs on open-source developer productivity.

Requirements

Strong knowledge of relevant literature and general research good practice.
Good understanding of how particular projects fit into METR's overall mission - you are thinking about things like "how will this generalize to future models", or "how does this relate to alignment evals".
Reliably notice important but subtle methodological limitations.
Undaunted by open-ended mandates - you can take a confusing or ill-posed question and produce insightful and helpful frameworks / proposals / results.
You can write great papers.
Experienced executor/contributor; you are familiar with patterns of successful and unsuccessful execution in frontier ML research. You are undaunted by "I've never done this before" or even "no-one has done this before".
Creative, ambitious and entrepreneurial. You work fast and are highly responsive and available. You can juggle many balls when it is useful.
Balance rapid prototyping with the creation of maintainable, scalable systems and make sound technical decisions.
Lead large projects from ideation to delivery, balancing innovative ML solutions with reliable, high-quality code.
Set high standards for system architecture, code quality, and maintainability, influencing broad software practices across the organization.

Benefits

Catered lunch and dinner daily; in-office gym and shower
Stipend for moving to the Bay Areaâ
Unlimited PTO and 21-week parental leave for new parents
Monthly transit/parking stipend and an annual Uber budget
Professional development benefit: for training, courses, conferences, and AI safety educationâ
Mental health benefit: for therapy, medication, and other mental health expensesâ
Wellness benefit: for gym memberships and other wellness expensesâ
Work equipment benefit: for home office and workstation equipmentâ expenses

Stand Out From the Crowd

Upload your resume and get instant feedback on how well it matches this job.

Upload and Match Resume