RESEARCHER, AGENTS FOR AUTOMATED DISCOVERY

MakerMaker•San Francisco, CA

3d•Onsite

About The Position

We're building autonomous research agents for recursive self-improvement (multi-agent systems that propose, run, and analyze machine learning experiments). We're a small team based in San Francisco, on-site. You'll be researching the agents at the core of our work: multi-agent systems that conduct automated machine learning research and discovery. You'll design how these agents plan, decompose problems, choose what to try next, evaluate their own outputs, and recover from mistakes. This is a deeply open-ended research role. The benchmarks for agents that do real research don't exist yet, and inventing them is part of the job. You'll move between method design, careful experimentation, building evaluation frameworks, and shipping into production. Real autonomy, real ownership, and the corresponding responsibility for choosing well.

Requirements

Strong track record of ML research with focus on agents, RL, LLMs, planning, tool use, or multi-agent systems
5+ years of hands-on research experience in industry or academia
Comfort designing experiments and running them end-to-end at scale
Track record of building evaluation frameworks for capabilities that aren't easily benchmarked
Bias toward shipping research, not handing it off
Strong written communication: you can compress a result into a paragraph that changes what someone else does next
Comfort with ambiguity: open-ended problems without fixed benchmarks are the work, not a frustration
Published research at NeurIPS, ICML, ICLR, COLM, RLC, or comparable venues

Nice To Haves

PhD in ML, statistics, CS, or adjacent
Published research on agentic systems, tool use, long-horizon planning, multi-agent coordination, or self-improvement methods
Open-source contributions in the agentic ML ecosystem (coding agents, research assistants, autonomous workflows)
Experience with reasoning models, chain-of-thought / scratchpad methods, or supervised fine-tuning for agentic behaviors
Background in evaluation methodology for capabilities that don't have established benchmarks

Responsibilities

Design methods that improve how our agents plan, decompose tasks, use tools, manage context, and recover from failures across long-horizon research workflows
Develop multi-agent coordination patterns: how multiple agents share context, divide labor, supervise each other, and combine their outputs
Build and maintain evaluation frameworks for agent capability on open-ended tasks (the kind where the right answer isn't pre-specified)
Run rigorous experiments to characterize what works, what doesn't, and why: controls, ablations, statistical significance
Co-design agent architectures with engineering teammates; ship the most promising methods into production
Read deeply across the agentic ML, planning, RL, and tool-use literature; bring useful work from outside in
Share findings internally so the rest of the team builds on them
Help shape research direction across the team: agentic research taste compounds when discussed openly

Stand Out From the Crowd

Upload your resume and get instant feedback on how well it matches this job.

Upload and Match Resume