Member of Technical Staff, Model Evaluation

Mirendil•United States, CA

1d•Remote

About The Position

Mirendil is a tech-first company focused on solving core bottlenecks that unlock step-change acceleration across science and technology. Our first goal is to democratize frontier AI R&D across scientific disciplines. We believe accelerating scientific discovery is one of the most powerful ways to improve the future of humanity, and that AI will play a central role in making that possible. We are building a frontier AI research company and training our own models end-to-end. Our work spans areas such as model training, reinforcement learning, reasoning systems, and infrastructure for large-scale experiments. Our team includes researchers and engineers from Anthropic, Google DeepMind, xAI, OpenAI, Microsoft, Apple, and MIT.

Requirements

Build the evaluation infrastructure that tells us whether our models are getting better in ways we care about.
Own the frameworks, pipelines, and tooling that measure model behavior across capabilities.

Responsibilities

Design and build evaluation frameworks that measure model capabilities along realistic axes, beyond standard benchmarks.
Build automated eval pipelines and regression-detection systems that run continuously and surface signal quickly.
Develop agent-assisted workflows for humans to efficiently inspect model behavior.
Instrument training runs with observability tooling so researchers can understand what's changing in model behavior, and why.
Partner with post-training and RL teams to close the loop between eval signal and training decisions.