TPM Manager, Compute & Infrastructure

Anthropic•Seattle, WA

6h•Hybrid

About The Position

Anthropic’s Compute and Infrastructure organizations are responsible for the systems that train our models, serve our products, and support our engineering teams. That includes datacenter operations, capacity planning across cloud providers and our own facilities, accelerator cluster management, production serving infrastructure, developer tooling, data pipelines, and networking. It’s a lot of surface area, and the demands on it are growing fast. We’re hiring a TPM leader to own program management across this whole ecosystem from compute supply through to production workloads. Today, there are a few TPMs working in these areas but no dedicated TPM team, and you will build it. We’re bringing up new datacenters, scaling multi-cloud compute across AWS, GCP, and Azure, managing datacenter construction, and building the software infrastructure to keep pace. The team is currently small but expected to grow very quickly, and we’re looking for a senior leader with experience at scale to build and scale this team to support Anthropic’s rapid growth. You’ll report to the Head of TPM, partnering closely with various engineering leaders on technical strategy, roadmapping, and aligning TPM support where it is most impactful. Expect to spend most of your time as an IC at the start. You’ll personally drive 2–3 critical programs while hiring your team in parallel. As the team grows, you’ll shift more toward people leadership, but this is a role where you need to be comfortable doing the work yourself before you can hand it off.

Requirements

Have 10+ years of experience in technical program management, with 7+ years directly managing TPMs and ideally some experience leading larger TPM organizations
Have built a team or function from scratch before—you know the difference between hiring for a defined role vs. figuring out what the roles should be
Have scaled TPM teams to support rapidly-growing, fast-moving company environments
Have worked across physical and software infrastructure—datacenters, networking, hardware ops, distributed systems, cloud platforms, developer tooling. You don’t need to be deep in all of it, but you need to be conversant enough to ask the right questions and spot the real risks.
Have run large-scale compute or infrastructure programs—capacity planning, cluster deployments, datacenter build-outs, cloud migrations, or similar
Can communicate complex programs clearly to senior leadership without losing the important details
Are good at context-switching between doing the work and managing people, and don’t see the IC work as beneath you
Are comfortable making staffing and prioritization decisions without perfect information
We require at least a Bachelor's degree in a related field or equivalent experience.

Responsibilities

Own and drive 2–3 of the highest-priority programs across compute and infrastructure while you build the team
Run the actual programs—datacenter bring-up timelines, capacity scaling plans, infrastructure migrations, cross-team reliability efforts, or whatever the most pressing needs are
Build the processes and playbooks as you go—figure out what works by doing it, then codify it for the team
Earn credibility with engineering leads through solid execution, not just strategy
Build a TPM team largely from scratch: define roles, write JDs, source candidates, close hires
Set the standard for what good TPM work looks like in this domain through your own output
Coach and develop TPMs
Transition programs to your team as you hire
Work with various engineering leads to identify work that would most benefit from TPM support
Make real tradeoffs about what to staff vs. what to skip given limited TPM capacity during the build phase
Maintain portfolio-level visibility across programs—status, risks, dependencies, blockers
Represent the team in planning cycles and leadership reviews
Coordinate across Compute, Infrastructure, and partner teams (Research, Product, Security, Finance, Legal) on programs that span organizational boundaries
Drive alignment on programs that cross the hardware/software line—e.g., capacity plans that feed into training schedules, or efficiency work that spans accelerator kernels and serving systems
Own executive communication on program status, risks, and resource needs