Staff AI Research Scientist - Evaluation, Handshake AI

Handshake•San Francisco, CA

93d

About The Position

As a Staff Research Scientist, you will drive frontier research on how we define intelligence of frontier models, i.e. develop benchmarks and measurements that help the research community to understand how large language models (LLMs) understand, reason, and interact with human knowledge. You will lead teams of researchers to produce original research in LLM evaluation methodologies, interpretability, and human-AI knowledge alignment. You will develop novel frameworks and assessment techniques that reveal deep insights into model capabilities, limitations, and emergent behaviors. Collaborating with engineers, you will translate research breakthroughs into scalable benchmarks, evaluation systems, and standards. You will pioneer new approaches to measuring reasoning, alignment, and trustworthiness in frontier AI systems, author high-quality code to enable large-scale experimentation, reproducible evaluation, and knowledge assessment workflows. Additionally, you will publish in top-tier conferences and journals, establishing new directions in the science of AI evaluation, and work cross-functionally with leadership, engineers, and external partners to set industry standards for responsible AI evaluation and alignment.

Requirements

PhD or equivalent research experience in machine learning, computer science, cognitive science, or related fields with focus on AI evaluation, interpretability, or model understanding.
6+ years of academic or industry experience post-doc in a research-first environment.
Strong background in LLM research, evaluation methodologies, and/or foundational AI assessment techniques.
Proven ability to independently design, lead, and execute evaluation research programs with novel data types end-to-end.
Deep proficiency in Python and PyTorch for large-scale model analysis, benchmarking, and evaluation.
Experience building or leading novel benchmark development, systematic model assessment, or interpretability studies.
Strong publication record in post-training, evaluation, or interpretability that demonstrates field-defining contributions.
Ability to clearly communicate complex insights and influence both technical and non-technical stakeholders.

Nice To Haves

Experience with RLHF, agent modeling, or AI alignment research.
Familiarity with data-centric AI approaches, synthetic data generation, or human-in-the-loop systems.
Understanding of challenges in scaling foundation models (training stability, safety, inference efficiency).
Contributions to open-source libraries or research tooling.
Interest in the societal impact, deployment ethics, and governance of frontier AI systems.

Responsibilities

Lead teams of researchers to produce original research in LLM evaluation methodologies, interpretability, and human-AI knowledge alignment.
Develop novel frameworks and assessment techniques that reveal deep insights into model capabilities, limitations, and emergent behaviors.
Collaborate with engineers to translate research breakthroughs into scalable benchmarks, evaluation systems, and standards.
Pioneer new approaches to measuring reasoning, alignment, and trustworthiness in frontier AI systems.
Author high-quality code to enable large-scale experimentation, reproducible evaluation, and knowledge assessment workflows.
Publish in top-tier conferences and journals, establishing new directions in the science of AI evaluation.
Work cross-functionally with leadership, engineers, and external partners to set industry standards for responsible AI evaluation and alignment.

Benefits

Equity in a fast-growing company.
401(k) match, competitive compensation, financial coaching.
Paid parental leave, fertility benefits, parental coaching.
Medical, dental, and vision, mental health support, $500 wellness stipend.
$2,000 learning stipend, ongoing development.
Stipends for home office setup, internet, commuting, and free lunch/gym in our SF office.
Flexible PTO, 15 holidays + 2 flex days, winter #ShakeBreak where our whole office closes for a week!
Team outings & referral bonuses.

Stand Out From the Crowd

Upload your resume and get instant feedback on how well it matches this job.

Upload and Match Resume

What This Job Offers

Job Type

Full-time

Career Level

Senior

Education Level

Ph.D. or professional degree

Number of Employees

501-1,000 employees

Staff AI Research Scientist - Evaluation, Handshake AI

About The Position

Requirements

Nice To Haves

Responsibilities

Benefits

What This Job Offers

Job Search Resources

Tools

Career Hubs

Guides

Company