Research Scientist - Frontier Data (ONSITE IN SF)

PulseRise Technologies•San Francisco, CA

15h•Onsite

About The Position

Looking for a Research Scientist - Frontier Data for a hands-on, high-leverage research role. You will design the datasets and evaluation frameworks that shape how frontier models are trained and measured. Working directly with research teams at the world's top AI labs, you will experiment with data collection strategies, diagnose model failure modes, and develop the metrics that determine whether a model is actually getting better. This is not a theorizing role. You will quickly move from hypothesis to a live experiment, and your output will directly influence model training runs at scale. The team is small, the impact is outsized, and individual contributors here have a direct line to how the next generation of models learns and improves.

Requirements

Strong quantitative instincts with familiarity with LLM training pipelines, RLHF or RLVR, or evaluation methodology. Does not need a PhD but must have the research depth of a strong undergrad or master's researcher
Genuine obsession with how data structure, selection, and quality drive model behavior. This is the core of the work and must be intrinsically motivated
Ability to design lightweight experiments, move fast, and extract actionable insights from messy and incomplete results
Comfort working across domains, the work touches finance, software engineering, policy, and more. Must be able to context-switch and reason clearly across all of them
Bias toward building over theorizing. Ships experiments and iterates, does not get stuck in design

Nice To Haves

Prior work or internship at RL environment companies, AI safety organizations, or benchmarking organizations such as METR or Artificial Analysis
Background in evaluation methodology, benchmark design, or dataset curation at a lab or research organization
Exposure to annotator modeling, reward signal design, or alignment-related research

Responsibilities

Design data slices and explore data shapes that expose meaningful model failure modes across domains, including finance, code, and enterprise workflows
Build and refine evaluation rubrics and reward signals for RLHF and RLVR training pipelines
Model annotator behavior and run experiments to improve different model capabilities
Develop quantitative frameworks for measuring dataset quality, diversity, and downstream impact on model alignment and capability
Partner with lab research teams to translate their training objectives into concrete data and evaluation specifications
Move fast from hypothesis to experiment, extract actionable insights from messy results, and iterate quickly

Stand Out From the Crowd

Upload your resume and get instant feedback on how well it matches this job.

Upload and Match Resume