Member of Technical Staff, Post-Training, RL Infra

Mirendil•United States, CA

About The Position

Mirendil is a tech-first company focused on solving core bottlenecks that unlock step-change acceleration across science and technology. Our first goal is to democratize frontier AI R&D across scientific disciplines. We believe accelerating scientific discovery is one of the most powerful ways to improve the future of humanity, and that AI will play a central role in making that possible. We are building a frontier AI research company and training our own models end-to-end. Our work spans areas such as model training, reinforcement learning, reasoning systems, and infrastructure for large-scale experiments. Our team includes researchers and engineers from Anthropic, Google DeepMind, xAI, OpenAI, Microsoft, Apple, and MIT.

Requirements

Engineers to help build the post-training stack for frontier reasoning models
Work to push the scale of our RL stack, whether it is novel recipe ideas, reliability, or performance
Excited about building the infrastructure that makes frontier RL research possible at scale

Responsibilities

Design and build reliable infrastructure for large-scale RL training
Implement novel performance optimizations across the training stack
Develop evaluation and benchmarking infrastructure to measure model progress, throughput, and uptime
Build data collection and feedback pipelines that close the loop between human signal, reward modeling, and training
Collaborate with multiple teams to rapidly iterate on RL algorithms and get experiments into production training runs