Senior Manager, Observability

CoreWeaveSunnyvale, CA
Hybrid

About The Position

The Observability Engineering organization at CoreWeave is responsible for the platforms and practices that help engineers understand, operate, and improve production systems at scale. This team owns and evolves the foundations for metrics, logs, traces, telemetry pipelines, and observability reliability, enabling teams to detect issues quickly, troubleshoot complex distributed systems, and operate AI infrastructure with confidence. As CoreWeave continues to scale, observability plays a critical role in delivering reliable platform experiences, improving engineering velocity, and maintaining operational excellence across a rapidly growing cloud environment. CoreWeave is seeking a Senior Manager, Observability Engineering to lead a team responsible for building, scaling, and operating observability systems across metrics, logs, traces, and telemetry pipelines. In this role, you will define strategy and roadmap, drive platform reliability and performance improvements, and guide architectural decisions across observability infrastructure. You will partner closely with infrastructure, platform, security, and application engineering teams to improve instrumentation and production visibility. This role combines technical leadership, operational ownership, and team management to ensure observability platforms scale with business and customer needs.

Requirements

  • 8+ years of software engineering experience with production systems at scale
  • 4+ years of engineering management experience leading senior engineers and technical leads
  • Experience building and operating observability platforms across logs, metrics, traces, and alerting in distributed systems
  • Knowledge of reliability engineering concepts including SLOs, SLIs, incident management, error budgets, and fault-tolerant design
  • Experience scaling telemetry systems including collection pipelines, storage backends, and query layers
  • Experience with distributed systems, performance engineering, and trade-offs involving scale, resilience, and cost
  • Experience partnering with infrastructure, security, and application engineering teams to drive platform adoption
  • Experience hiring and managing engineering teams

Nice To Haves

  • Experience with OpenTelemetry, Grafana, Prometheus-compatible systems, log aggregation, and distributed tracing tools
  • Experience operating cloud-native infrastructure, including Kubernetes environments
  • Experience supporting large-scale cloud, developer platforms, or AI/ML infrastructure
  • Familiarity with capacity planning for high-ingest telemetry systems
  • Experience scaling platforms in high-growth environments

Responsibilities

  • Lead a team responsible for building, scaling, and operating observability systems across metrics, logs, traces, and telemetry pipelines.
  • Define strategy and roadmap for observability systems.
  • Drive platform reliability and performance improvements.
  • Guide architectural decisions across observability infrastructure.
  • Partner closely with infrastructure, platform, security, and application engineering teams to improve instrumentation and production visibility.
  • Ensure observability platforms scale with business and customer needs.

Benefits

  • Medical, dental, and vision insurance - 100% paid for by CoreWeave
  • Company-paid Life Insurance
  • Voluntary supplemental life insurance
  • Short and long-term disability insurance
  • Flexible Spending Account
  • Health Savings Account
  • Tuition Reimbursement
  • Ability to Participate in Employee Stock Purchase Program (ESPP)
  • Mental Wellness Benefits through Spring Health
  • Family-Forming support provided by Carrot
  • Paid Parental Leave
  • Flexible, full-service childcare support with Kinside
  • 401(k) with a generous employer match
  • Flexible PTO
  • Catered lunch each day in our office and data center locations
  • A casual work environment
  • A work culture focused on innovative disruption

Stand Out From the Crowd

Upload your resume and get instant feedback on how well it matches this job.

Upload and Match Resume

What This Job Offers

Job Type

Full-time

Career Level

Senior

Education Level

No Education Listed

Number of Employees

251-500 employees

© 2026 Teal Labs, Inc
Privacy PolicyTerms of Service