Member of Technical Staff - Research

Polymath•San Francisco, CA

About The Position

Polymath is an applied research lab focused on advancing long-horizon agent capabilities through reinforcement learning. We design and scale simulation environments where agents learn to operate safely and autonomously. We work with the world’s leading model labs to push the frontier of agent capabilities. Polymath is backed by Base10, Founders Future, Y Combinator, and other incredible investors & angels. We've raised an $8M seed, and are growing out our founding team. About the role We’re hiring a Member of Technical Staff - Research to help advance the frontier of autonomous agents. You’ll work on core research problems in long-horizon evaluation, agent post-training, and environment design, with a focus on understanding where current models fail and how to improve them. As a member of the founding team, you should expect to wear multiple hats: building benchmarks, creating environments, writing production code, and running rigorous experiments. We’re looking for people who are excited by hard open-ended problems and want to operate at the intersection of research and engineering. Examples of projects you could work on include: Developing an advanced environment simulation engine for training & evaluating autonomous AI agents Investigating failure modes of frontier models Creating rigorous benchmarks that evaluate how well frontier agents perform on complex, realistic tasks requiring long-horizon reasoning and tool use in dynamic environments Post-training agents in complex simulation environments Publishing research

Requirements

Strong engineering & research fundamentals
Prolific user of AI tools
Experience post-training frontier models
Experience shipping reliable, production-quality code
Track record of publications

Responsibilities

Developing an advanced environment simulation engine for training & evaluating autonomous AI agents
Investigating failure modes of frontier models
Creating rigorous benchmarks that evaluate how well frontier agents perform on complex, realistic tasks requiring long-horizon reasoning and tool use in dynamic environments
Post-training agents in complex simulation environments
Publishing research
Building benchmarks
Creating environments
Writing production code
Running rigorous experiments