Anthropic runs some of the largest Kubernetes clusters in the industry, with fleets of hundreds of thousands of nodes across multiple cloud providers and datacenters for training, research, and serving frontier AI models. The Kubernetes Platform team owns the Kubernetes control plane that makes these clusters work. The team operates at a scale where defaults no longer apply, owning the scheduler and extending it for topology-sensitive ML workloads across thousands of accelerators. They also scale the control plane components (apiserver, etcd, controllers) to remain responsive with orders of magnitude growth in object and node counts, and build core cluster services like service discovery. The role directly impacts Anthropic's ability to reliably and safely train frontier models as the compute footprint grows.
Stand Out From the Crowd
Upload your resume and get instant feedback on how well it matches this job.
Job Type
Full-time
Career Level
Senior