TPM Manager, Compute & Infrastructure

AnthropicSeattle, WA
6hHybrid

About The Position

Anthropic’s Compute and Infrastructure organizations are responsible for the systems that train our models, serve our products, and support our engineering teams. That includes datacenter operations, capacity planning across cloud providers and our own facilities, accelerator cluster management, production serving infrastructure, developer tooling, data pipelines, and networking. It’s a lot of surface area, and the demands on it are growing fast. We’re hiring a TPM leader to own program management across this whole ecosystem from compute supply through to production workloads. Today, there are a few TPMs working in these areas but no dedicated TPM team, and you will build it. We’re bringing up new datacenters, scaling multi-cloud compute across AWS, GCP, and Azure, managing datacenter construction, and building the software infrastructure to keep pace. The team is currently small but expected to grow very quickly, and we’re looking for a senior leader with experience at scale to build and scale this team to support Anthropic’s rapid growth. You’ll report to the Head of TPM, partnering closely with various engineering leaders on technical strategy, roadmapping, and aligning TPM support where it is most impactful. Expect to spend most of your time as an IC at the start. You’ll personally drive 2–3 critical programs while hiring your team in parallel. As the team grows, you’ll shift more toward people leadership, but this is a role where you need to be comfortable doing the work yourself before you can hand it off.

Requirements

  • Have 10+ years of experience in technical program management, with 7+ years directly managing TPMs and ideally some experience leading larger TPM organizations
  • Have built a team or function from scratch before—you know the difference between hiring for a defined role vs. figuring out what the roles should be
  • Have scaled TPM teams to support rapidly-growing, fast-moving company environments
  • Have worked across physical and software infrastructure—datacenters, networking, hardware ops, distributed systems, cloud platforms, developer tooling. You don’t need to be deep in all of it, but you need to be conversant enough to ask the right questions and spot the real risks.
  • Have run large-scale compute or infrastructure programs—capacity planning, cluster deployments, datacenter build-outs, cloud migrations, or similar
  • Can communicate complex programs clearly to senior leadership without losing the important details
  • Are good at context-switching between doing the work and managing people, and don’t see the IC work as beneath you
  • Are comfortable making staffing and prioritization decisions without perfect information
  • We require at least a Bachelor's degree in a related field or equivalent experience.

Responsibilities

  • Own and drive 2–3 of the highest-priority programs across compute and infrastructure while you build the team
  • Run the actual programs—datacenter bring-up timelines, capacity scaling plans, infrastructure migrations, cross-team reliability efforts, or whatever the most pressing needs are
  • Build the processes and playbooks as you go—figure out what works by doing it, then codify it for the team
  • Earn credibility with engineering leads through solid execution, not just strategy
  • Build a TPM team largely from scratch: define roles, write JDs, source candidates, close hires
  • Set the standard for what good TPM work looks like in this domain through your own output
  • Coach and develop TPMs
  • Transition programs to your team as you hire
  • Work with various engineering leads to identify work that would most benefit from TPM support
  • Make real tradeoffs about what to staff vs. what to skip given limited TPM capacity during the build phase
  • Maintain portfolio-level visibility across programs—status, risks, dependencies, blockers
  • Represent the team in planning cycles and leadership reviews
  • Coordinate across Compute, Infrastructure, and partner teams (Research, Product, Security, Finance, Legal) on programs that span organizational boundaries
  • Drive alignment on programs that cross the hardware/software line—e.g., capacity plans that feed into training schedules, or efficiency work that spans accelerator kernels and serving systems
  • Own executive communication on program status, risks, and resource needs

Benefits

  • competitive compensation and benefits
  • optional equity donation matching
  • generous vacation and parental leave
  • flexible working hours
  • a lovely office space in which to collaborate with colleagues
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service