As a Staff Research Scientist, you will drive frontier research on how we define intelligence of frontier models, i.e. develop benchmarks and measurements that help the research community to understand how large language models (LLMs) understand, reason, and interact with human knowledge. You will lead teams of researchers to produce original research in LLM evaluation methodologies, interpretability, and human-AI knowledge alignment. You will develop novel frameworks and assessment techniques that reveal deep insights into model capabilities, limitations, and emergent behaviors. Collaborating with engineers, you will translate research breakthroughs into scalable benchmarks, evaluation systems, and standards. You will pioneer new approaches to measuring reasoning, alignment, and trustworthiness in frontier AI systems, author high-quality code to enable large-scale experimentation, reproducible evaluation, and knowledge assessment workflows. Additionally, you will publish in top-tier conferences and journals, establishing new directions in the science of AI evaluation, and work cross-functionally with leadership, engineers, and external partners to set industry standards for responsible AI evaluation and alignment.
Stand Out From the Crowd
Upload your resume and get instant feedback on how well it matches this job.
Job Type
Full-time
Career Level
Senior
Education Level
Ph.D. or professional degree
Number of Employees
501-1,000 employees