Cluster Deployment Engineer

Anthropic
Hybrid

About The Position

Anthropic’s mission is to create reliable, interpretable, and steerable AI systems. We want AI to be safe and beneficial for our users and for society as a whole. Our team is a quickly growing group of committed researchers, engineers, policy experts, and business leaders working together to build beneficial AI systems. ABOUT THE ROLE As a Cluster Deployment Engineer, you will own how large-scale AI compute clusters physically come together inside our datacenter fleet. You will set the deployment-engineering strategy for cluster build-out — how racks are organized into pods, halls, and sites; how compute, network, power, and cooling systems interface at the rack boundary; and how deployment scope flows cleanly from hardware specification to facility delivery to a running cluster. This role is focused on deployment engineering, not on datacenter network or systems design — your scope is making sure clusters land cleanly and predictably, not designing the fabrics or facilities themselves. This is a senior individual-contributor role with broad technical influence. You will work across hardware, networking, facilities, supply chain, and construction to ensure that every generation of accelerator we deploy lands in a datacenter that is ready for it — on schedule, at full density, and with every piece of required infrastructure accounted for. You will be the person who sees around corners: anticipating how next-generation rack designs will stress our facilities, where our deployment model will break at scale, and what needs to change now so that the next cluster turn-up is faster and more predictable than the last. You will operate at the intersection of engineering strategy and execution discipline, partnering with internal research and systems teams, external developers, engineering firms, and OEM partners to deliver cluster capacity at the speed the frontier demands.

Requirements

  • Have 10+ years of experience in hyperscale datacenter environments, with senior-level responsibility for cluster deployment, large-scale IT integration, or equivalent infrastructure programs.
  • Have delivered AI, HPC, or high-density compute clusters at scale and developed a strong intuition for the constraints that govern cluster deployment — interconnect reach, adjacency, power density, and thermal limits.
  • Can operate fluently across the boundary between IT hardware and facility infrastructure, and have set interface standards that held up across multiple hardware generations and sites.
  • Have led cross-functional programs with both internal engineering teams and external developers, engineering firms, and OEM partners, and are effective at driving alignment across organizational levels.
  • Combine strong systems thinking with execution discipline — comfortable zooming from cluster topology and portfolio strategy down to the specific interface detail that will otherwise become a field issue.
  • Communicate clearly with technical and executive audiences, and can distill complex, multi-disciplinary programs into decisions and tradeoffs leadership can act on.
  • Thrive in ambiguous, fast-moving environments where the hardware, the scale, and the requirements are all changing simultaneously.
  • Hold a Bachelor's degree in Electrical Engineering, Mechanical Engineering, Computer Engineering, or equivalent practical experience.

Nice To Haves

  • Have direct experience deploying leading-edge AI accelerator clusters at hyperscale.
  • Have shaped reference designs, deployment standards, or cluster-level playbooks that were adopted across a fleet.
  • Have experience working across multiple geographies and understand how regional codes, climate, utility constraints, and supply chains shape cluster-level decisions.
  • Have partnered closely with hardware and system providers on long-term platform onboarding and bring-up.
  • Have experience building the program mechanisms — roadmaps, milestones, risk registers, runbooks — that make delivery predictable at massive scale.

Responsibilities

  • Own cluster-level deployment strategy — define how AI compute clusters are organized across the floor, how racks interconnect, and how cluster topology requirements translate into facility and deployment scope across a portfolio of sites.
  • Set rack interface standards spanning power, network, mechanical, thermal, and spatial domains, and ensure that every deployment includes the complete set of infrastructure required to bring a cluster online.
  • Drive multi-threaded cluster bring-up programs across hardware, networking, power, and cooling — owning plans, dependencies, and critical paths from hardware specification through energization and turn-up.
  • Partner with internal engineering teams — research, systems, networking, and hardware — to translate cluster requirements into deployable facility scope, and to derisk onboarding of new hardware platforms well ahead of delivery.
  • Lead external partner execution with developers, engineering firms, OEMs, and construction teams, driving technical reviews, deviation management, and handoffs that keep deployments on schedule and within specification.
  • Improve cluster turn-up reliability and repeatability — identify systemic gaps in deployment scope, tooling, and partner interfaces, and drive durable fixes that reduce time-to-serve for new capacity.
  • Define and track deployment KPIs — cluster readiness, schedule adherence, scope completeness, time-to-first-packet — and use historical trends to forecast risk and inform capacity planning.
  • Coordinate cross-functional readiness across supply chain, security, operations, and construction to ship production-ready compute capacity.
  • Provide crisp executive visibility on deployment progress, tradeoffs, and risks across a portfolio of concurrent cluster programs.
  • Design cluster interfaces for durability — define rack and cluster-level interfaces that remain robust across hardware generations, so that facility scope and deployment models do not need to be reinvented every time the underlying hardware changes.
  • Build cluster layout and BOM tooling — create and maintain the tools, templates, and data models that turn cluster topology and rack specifications into accurate floor layouts, deployment sequences, and complete bills of materials, replacing one-off spreadsheets with repeatable, auditable workflows.

Benefits

  • We offer competitive compensation and benefits, optional equity donation matching, generous vacation and parental leave, flexible working hours, and a lovely office space in which to collaborate with colleagues.
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service