Staff+ Software Engineer, Infrastructure

Anthropic•San Francisco, NY

48d•Hybrid

About The Position

Anthropic is seeking talented and experienced Infrastructure Engineers to join our team and support the development, scaling, and maintenance of our cutting-edge AI systems. By joining our Infrastructure team, you will have the opportunity to work on groundbreaking AI technologies and contribute to the development of frontier models, supporting Anthropic's mission to create safe and reliable AI systems that benefit humanity. Anthropic's Infrastructure organization is the engine that powers our mission to develop AI systems that are safe, beneficial, and understandable. Every breakthrough in AI safety research and every interaction users have with Claude depends on the systems we build and operate: massive clusters for training, production infrastructure serving millions of users reliably, and developer platforms that help engineers move fast without breaking things. And even with that, this isn't typical infrastructure work. We're building at the frontier of what's possible, solving novel scaling challenges that few organizations face with a high degree of security, all in service of ensuring transformative AI benefits humanity. If you're energized by your technical work directly enabling some of the most important research happening today, Infrastructure at Anthropic is the best place to make a real difference. We have multiple teams that are currently hiring. Team placement occurs after the interview process, taking into account your interests and experience alongside organizational needs. This flexible approach allows us to match talented engineers with the infrastructure teams where they'll have the greatest impact and growth potential.

Requirements

Have 10+ years of relevant industry experience, 3+ years leading large scale, complex projects or teams as an engineer or tech lead
Are obsessed with distributed systems at scale, infrastructure reliability, scalability, security, and continuous improvement
Strong proficiency in at least one programming language (e.g., Python, Rust, Go, Java)
Strong problem-solving skills and ability to work independently
Have a passion for supporting internal partners like research to understand their needs
Have excellent communication skills to build consensus with stakeholders, both internally and externally
Possess deep knowledge of modern cloud infrastructure including Kubernetes, Infrastructure as Code, AWS, and GCP
We require at least a Bachelor's degree in a related field or equivalent experience.

Nice To Haves

Security and privacy best practice expertise
Experience with machine learning infrastructure like GPUs, TPUs, or Trainium, as well as supporting networking infrastructure like NCCL
Low level systems experience, for example linux kernel tuning and eBPF
Technical expertise: Quickly understanding systems design tradeoffs, keeping track of rapidly evolving software systems

Responsibilities

Lead build out of industry-leading AI clusters (thousands to hundreds of thousands of machines), partnering closely with cloud service providers on cluster build out and required features
Consult with different stakeholders to deeply understand infrastructure, data and compute needs, identifying potential solutions to support frontier research and product development
Set technical strategy and oversee development of high scale, reliable infrastructure systems.
Mentor top technical talent
Design processes (e.g. postmortem review, incident response, on-call rotations) that help the team operate effectively and never fail the same way twice

Benefits

We offer competitive compensation and benefits, optional equity donation matching, generous vacation and parental leave, flexible working hours, and a lovely office space in which to collaborate with colleagues.

Stand Out From the Crowd

Upload your resume and get instant feedback on how well it matches this job.

Upload and Match Resume