Research Compute Operations

Anthropic•San Francisco, CA

3h•Hybrid

About The Position

Anthropic’s mission is to create reliable, interpretable, and steerable AI systems. We want AI to be safe and beneficial for our users and for society as a whole. Our team is a quickly growing group of committed researchers, engineers, policy experts, and business leaders working together to build beneficial AI systems. About the Role Anthropic's researchers use internal tooling and infrastructure to run the experiments that advance AI safety and capability. This role owns the researcher experience with that tooling — both the day-to-day support and the longer-term product vision. You'll be the person researchers come to when they need help, and the person driving improvements and automation to make that manual help unnecessary over time. This role sits on the Capacity Operations team at the intersection of research and infrastructure.

Requirements

Have an engineering background (or equivalent technical depth) and have transitioned into or are drawn to product management, technical operations, or systems design work
Can query data, understand infrastructure, debug issues, and build tools and scripts to prototype solutions quickly
Are a systems-thinker: when a researcher hits a confusing error, you don't just fix it, you ask why the system produced it and how to prevent it for everyone
Are comfortable navigating ambiguity across teams and context-switching between tactical support and strategic design
Use Claude or other AI tools daily and are excited to teach others your best practices
We require at least a Bachelor's degree in a related field or equivalent experience.

Nice To Haves

An understanding of compute infrastructure and familiarity with concepts like rate limiting, autoscaling, and request prioritization
Background in ML infrastructure, ML engineering, or research engineering
Experience with large-scale accelerator clusters (TPUs, GPUs, or similar)
Familiarity with ML training pipelines and how they consume inference capacity
Track record of building internal tools or developer platforms that people actually love using
Experience in developer experience (DevEx) or platform engineering

Responsibilities

Serve as a primary point of contact for researchers using internal compute infrastructure, including triaging access issues, resolving researcher requests, and real-time monitoring
Proactively monitor usage patterns and work with researchers to optimize their workloads
Help design the product roadmap for research inference tooling. You will gather user feedback, prioritize improvements, and drive execution
Prototype better tools: dashboards, automations, self-service workflows, and more intuitive interfaces for complex systems
Build automations (using Claude) for common operational workflows