Research Engineer - Post training & RL

Techire Ai•San Francisco, CA

66d•Onsite

About The Position

Want to build the simulated worlds that test what frontier models are really capable of? This is a chance to join a team advancing the science of post-training and scalable evaluation — building reinforcement learning environments that push reasoning, planning, and long-horizon behaviour to their limits. Instead of static benchmarks, you’ll create dynamic simulations that measure real intelligence — not just accuracy. You’ll design new post-training algorithms (RLHF, DPO, GRPO and beyond), develop richer reward models that move past exact-match scoring, and build evaluation frameworks that define how next-generation AI is trained, aligned, and understood. The work combines deep research with hands-on implementation — from writing papers to seeing your methods deployed in live systems. It’s ideal for researchers who care about bridging academic insight and practical impact , helping AI progress beyond metrics that no longer tell the whole story.

Requirements

Research experience in post-training, reinforcement learning, or evaluation for LLMs.
Strong understanding of transformer models and experimental design.
Publication record at leading venues (NeurIPS, ICLR, ICML, ACL, EMNLP).
PhD or equivalent research experience in CS, ML, NLP, or RL.

Responsibilities

Build simulated worlds to test frontier models.
Advance the science of post-training and scalable evaluation.
Build reinforcement learning environments that push reasoning, planning, and long-horizon behaviour.
Create dynamic simulations that measure real intelligence.
Design new post-training algorithms (RLHF, DPO, GRPO and beyond).
Develop richer reward models that move past exact-match scoring.
Build evaluation frameworks that define how next-generation AI is trained, aligned, and understood.
Combine deep research with hands-on implementation.
Write papers and deploy methods in live systems.