Member of Technical Staff, Post-Training, RL Environments

Mirendil•United States, CA

About The Position

Mirendil is a tech-first company focused on solving core bottlenecks that unlock step-change acceleration across science and technology. Our first goal is to democratize frontier AI R&D across scientific disciplines. We believe accelerating scientific discovery is one of the most powerful ways to improve the future of humanity, and that AI will play a central role in making that possible. We are building a frontier AI research company and training our own models end-to-end. Our work spans areas such as model training, reinforcement learning, reasoning systems, and infrastructure for large-scale experiments. Our team includes researchers and engineers from Anthropic, Google DeepMind, xAI, OpenAI, Microsoft, Apple, and MIT.

Requirements

Build the data systems and execution environments that power reinforcement learning at Mirendil.
Own those systems end-to-end.

Responsibilities

Build and automate data collection pipelines for complex, long-horizon RL tasks.
Build robust systems to identify and prevent reward hacking.
Build scalable sandboxed execution environments for realistic tasks involving potentially multiple agents, nodes, and users.
Design systems to estimate the influence of training environments on production model behavior.
Collaborate with teams across the stack to identify potential axes of improvements in production model behavior, and develop training environments to push these axes.