Anthropic's Reinforcement Learning environments are the foundation of how Claude learns new capabilities. As we scale to massive training runs consuming trillions of tokens, we need someone to own the operational health and execution of our RL environments data pipeline. You'll be deeply embedded with Research, Infrastructure, and Data Operations teams - not just coordinating across them, but making hands-on technical decisions about data quality, environment configurations, and infrastructure priorities. This role requires both the technical depth to debug yield issues and configure complex ML systems, and the program management skills to coordinate across multiple teams during high-stakes production runs. This is operational technical leadership: you'll spend your time monitoring production environment health, coordinating in-flight changes during active training runs, driving infrastructure migrations, and ensuring our environment development keeps pace with our ambitious model training roadmap.