Manager, Engineering (Production Orchestration)

Cockroach LabsNew York, NY
$194,000 - $257,300Hybrid

About The Position

Category-defining tech. Career-defining work. Lots of tech companies disrupt. But, many fail when they try to scale. We're different. CockroachDB makes it easier for companies to build and scale apps. This is how and why we're helping some of the most innovative companies on the planet. We tackle problems head-on and focus on solutions that create lasting impact. Because when our customers win, we all win. The Role At the heart of CockroachDB is our Production Orchestration team- the stewards of availability, reliability, and scalability across our cloud offerings and beyond. Built on a foundation of SRE principles and carrying forward years of operational practice, our core commitment is clear: ensuring our customers have a secure, reliable, and performant production service at scale. We're looking for an Engineering Manager to lead our Production Orchestration team as part of a global Production Engineering organization. You'll drive foundational architectural changes to how we operate our fleet, champion AI-driven approaches to both development and operations, and foster a culture of operational excellence, ensuring CockroachDB meets and exceeds our SLAs while keeping pace with rapid growth. You'll report to Tom Schmidt, Director of Production Engineering, who has led this team for 4+ years and will continue to be deeply involved in its technical direction. You'll be responsible for the growth and development of the team's engineers, day-to-day execution, and operational health, while bringing your own leadership and ideas to the table.

Requirements

  • A passion for building relationships and a deep sense of responsibility for the welfare of the engineering team you manage, including their professional development and growth. We're looking for managers that want to empower their team to achieve their professional and personal goals.
  • Experience leading global operations and/or incident management and response.
  • Experience working on complex technical products with exposure to distributed systems, cloud infrastructure, container orchestration, or large-scale fleet management.
  • A strong SRE or Production Engineering background. You understand the principles of reliability engineering, SLOs/SLAs, error budgets, and the engineering approach to operations.
  • Comfort with programming languages like Go and Python. We use Go, but if you don't know it, you'll learn while you're here.
  • Solid systems architecture knowledge and an understanding of how a variety of teams' interactions may impact operational reliability.
  • Experience with performance management, understanding the importance of building an effective team that can function independently while collaborating and supporting each other.
  • Partnered across departments, ensuring coordination with internal teams and external partner teams across time zones.

Nice To Haves

  • Grown or managed teams that coordinate across multiple time zones.
  • Experience supporting workloads across multiple cloud providers (GCP, AWS, Azure).
  • Leveraged, or even better built, observability tooling for your team and the rest of your org.
  • Experience applying AI/ML to operational workflows (e.g., intelligent alerting, automated remediation, capacity forecasting).
  • Familiarity with CockroachDB or distributed SQL databases.

Responsibilities

  • Lead the Production Orchestration team, focused on the reliability, availability, and scalability of CockroachDB in production.
  • Own operational excellence. Ensure the team is meeting or exceeding our SLAs, running effective incident response, and continuously improving our operational posture. Every incident is treated as a learning opportunity.
  • Partner across the global Production Engineering organization to align on shared goals, ensure smooth coordination across time zones, and drive cohesive execution.
  • Drive automation and tooling. Relentlessly reduce operational toil by building systems that improve observability and scale our fleet without scaling headcount linearly.
  • Leverage AI to improve how the team builds and operates. Help the team adopt AI-assisted development practices and identify applied AI opportunities to improve operational workflows, from alert triage to capacity planning to incident response.
  • Contribute to foundational architecture. The team is building a new architectural initiative that will reshape how we operate our fleet. You'll help lead execution on this work and ensure the team has the space and support to deliver.
  • Coach and develop your engineers. Provide direct, constructive feedback. Guide personal development and career growth beyond just technical skills. Managing performance and ensuring engineers are achieving their goals is essential to retaining a high-performing team.
  • Partner with engineering and product leadership to shape the roadmap for CockroachDB's operational capabilities and future products.
  • Collaborate across teams to build and establish the tools and processes that empower everyone to make our customers successful.

Benefits

  • Stock Options
  • Medical Insurance
  • Vision Insurance
  • Dental Insurance
  • Life and Disability Insurance
  • Professional Development Funds
  • Flexible Time Off
  • Paid Holidays
  • Paid Sick Days
  • Paid Parental Leave
  • Retirement Benefits
  • Mental Wellbeing Benefits
  • 401(k) plan
© 2026 Teal Labs, Inc
Privacy PolicyTerms of Service