Engineering Manager - Observability

AnthropicSan Francisco, CA
8hHybrid

About The Position

Anthropic’s mission is to create reliable, interpretable, and steerable AI systems. We want AI to be safe and beneficial for our users and for society as a whole. Our team is a quickly growing group of committed researchers, engineers, policy experts, and business leaders working together to build beneficial AI systems. Anthropic is looking for an Engineering Manager to help lead our Observability team — the group responsible for the metrics infrastructure that keeps Anthropic's most critical systems running. When metrics go down, the company can't tell how training runs are progressing or whether production inference is healthy. This is mission-critical infrastructure with real operational stakes. You'll lead a growing software engineering team, partner with strong technical leads, and manage the internal and external relationships that make a platform team successful at scale. If you've led teams build a metrics or observability system before and thrive in high-operational-tempo environments, this is a rare chance to do it at a company where the infrastructure genuinely matters.

Requirements

  • Have 2+ years of engineering management experience leading observability, monitoring, or metrics infrastructure teams
  • Bring domain expertise in metrics infrastructure — you've worked with Prometheus, Grafana, time series databases, or similar technologies
  • Have experience managing an internal platform team with many stakeholders — you know how to manage competing priorities and communicate tradeoffs clearly
  • Are operationally minded — you've led teams with significant on-call burden and know how to make reliability a first-class priority
  • Are a positive, high-energy leader who creates a "we can do this" environment even when things are hard. Life on the exponential is challenging!

Nice To Haves

  • Running a metrics or observability system at a company with a large internal customer base
  • Managing external vendor partnerships for observability tooling
  • Observability for ML training or inference workloads
  • Building or operating metrics infrastructure at significant scale

Responsibilities

  • Help grow the Observability team, hiring exceptional software engineers and building a resilient, high-ownership culture
  • Own Anthropic's metrics platform end-to-end — design, reliability, roadmap, and operational excellence
  • Build strong partnerships with internal customers across infrastructure, training, and inference teams to understand needs and manage priorities
  • Partner with the team's technical leads to align on architecture, execution, and hiring
  • Drive operational rigor — making on-call and incident response sustainable and continuously improving

Benefits

  • competitive compensation and benefits
  • optional equity donation matching
  • generous vacation and parental leave
  • flexible working hours
  • a lovely office space in which to collaborate with colleagues
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service