Research Scientist - Frontier Data (ONSITE IN SF)

PulseRise TechnologiesSan Francisco, CA
Onsite

About The Position

Looking for a Research Scientist - Frontier Data for a hands-on, high-leverage research role. You will design the datasets and evaluation frameworks that shape how frontier models are trained and measured. Working directly with research teams at the world's top AI labs, you will experiment with data collection strategies, diagnose model failure modes, and develop the metrics that determine whether a model is actually getting better. This is not a theorizing role. You will quickly move from hypothesis to a live experiment, and your output will directly influence model training runs at scale. The team is small, the impact is outsized, and individual contributors here have a direct line to how the next generation of models learns and improves.

Requirements

  • Strong quantitative instincts with familiarity with LLM training pipelines, RLHF or RLVR, or evaluation methodology. Does not need a PhD but must have the research depth of a strong undergrad or master's researcher
  • Genuine obsession with how data structure, selection, and quality drive model behavior. This is the core of the work and must be intrinsically motivated
  • Ability to design lightweight experiments, move fast, and extract actionable insights from messy and incomplete results
  • Comfort working across domains, the work touches finance, software engineering, policy, and more. Must be able to context-switch and reason clearly across all of them
  • Bias toward building over theorizing. Ships experiments and iterates, does not get stuck in design

Nice To Haves

  • Prior work or internship at RL environment companies, AI safety organizations, or benchmarking organizations such as METR or Artificial Analysis
  • Background in evaluation methodology, benchmark design, or dataset curation at a lab or research organization
  • Exposure to annotator modeling, reward signal design, or alignment-related research

Responsibilities

  • Design data slices and explore data shapes that expose meaningful model failure modes across domains, including finance, code, and enterprise workflows
  • Build and refine evaluation rubrics and reward signals for RLHF and RLVR training pipelines
  • Model annotator behavior and run experiments to improve different model capabilities
  • Develop quantitative frameworks for measuring dataset quality, diversity, and downstream impact on model alignment and capability
  • Partner with lab research teams to translate their training objectives into concrete data and evaluation specifications
  • Move fast from hypothesis to experiment, extract actionable insights from messy results, and iterate quickly
© 2026 Teal Labs, Inc
Privacy PolicyTerms of Service