Research Fellowship - Open Endedness

VmaxSan Francisco, CA
Hybrid

About The Position

Vmax is an applied research lab focused on developing AI capable of open-ended learning. The lab aims to build systems that can exceed human capabilities by optimizing beyond the local maxima of learning from human expertise. A key area of focus is on agents that can discover their own objectives in the world. This fellowship is designed for PhD students or equivalent early-career researchers interested in working on Large Language Models (LLMs) that can learn in open-ended settings. Fellows will lead a focused research project, collaborate with Vmax technical staff, and contribute to research publications. The fellowship duration is typically 3 to 6 months.

Requirements

  • Currently enrolled in a PhD program in machine learning, computer science, artificial intelligence, computational neuroscience, mathematics, or a related technical field. Exceptional candidates with equivalent research experience may also be considered.
  • Track record of research excellence or strong research promise, demonstrated through publications, preprints, open-source work, technical projects, competitions, or publicly available artifacts.
  • Working understanding of reinforcement learning.
  • Familiarity with unsupervised/automated environment design, asymmetric self-play, and/or intrinsic motivation.
  • Strong programming ability in Python.
  • Experience with at least one major ML framework such as PyTorch or JAX.
  • Clear written and verbal communication of technical ideas.

Nice To Haves

  • Experience with LLM post-training methods.
  • Experience with scalable ML experimentation, distributed training, experiment tracking, or reproducible research infrastructure.
  • Demonstrated taste for identifying non-obvious research directions and converting them into tractable experiments.

Responsibilities

  • Develop Reinforcement Learning (RL) methods for agents that can discover useful objectives, tasks, and curricula without solely relying on human-specified rewards.
  • Design systems for open-ended learning, encompassing areas like unsupervised/automated environment design, asymmetric self-play, and intrinsic motivation.
  • Build training loops where agents learn from interaction, exploration, novelty, competence progress, self-generated challenges, or other non-standard reward signals.
  • Investigate methods for agents to avoid collapsing into trivial, degenerate, or easily exploitable objectives.
  • Own and develop a research agenda within Vmax, from identifying promising research directions to executing experiments and communicating results.
© 2026 Teal Labs, Inc
Privacy PolicyTerms of Service