Software Engineer, Reliability Platform

DoorDash USASunnyvale, CA
84d$130,600 - $192,000

About The Position

The Reliability Platforms organization is part of DoorDash’s Production Lifecycle team, which owns the end-to-end experience of how engineers safely change, observe, and operate production systems. Our mission is to enable teams to confidently make changes to production, understand reliability and service health on demand, and abstract complexity into platforms for common operations with built-in guardrails that make safe, repeatable operations the default. Reliability Platforms builds platforms as products that touch every production change, with two primary areas of focus: Self-Serve Infrastructure & Configuration Change Control - building the systems engineers use to provision services, request cloud resources, and safely make config changes across traffic, compute, and secrets; Reliability & Service Health - delivering unified health scores, SLOs, alerting pipelines, and automation that help engineers know what’s happening, improve reliability, and act quickly when something goes wrong. Together, these focus areas form the backbone of how DoorDash engineers safely ship, observe, and remediate production systems.

Requirements

  • Experience in software development, particularly in Go.
  • Strong understanding of infrastructure and configuration management.
  • Ability to design and implement APIs and user interfaces.
  • Experience with automation and incident management systems.
  • Familiarity with reliability engineering principles and practices.

Nice To Haves

  • Experience with AI-assisted operations and workflows.
  • Knowledge of cloud services and resource provisioning.
  • Familiarity with SLOs and health monitoring tools.

Responsibilities

  • Design and develop systems in Go that let engineers safely request infra, configure services, and manage production state.
  • Add guardrails, validation, and progressive rollout capabilities for infra and config changes.
  • Provide pre-flight checks, posture scoring, and unified health views to catch issues before they reach production.
  • Contribute to systems that remediate incidents automatically or guide engineers through resolution quickly.
  • Help evolve our UIs and APIs into a single entry point for production change and health insights.
  • Experiment with agentic, AI-assisted workflows that can propose, validate, and safely execute production changes.

Benefits

  • 401(k) plan with employer matching
  • 16 weeks of paid parental leave
  • Wellness benefits
  • Commuter benefits match
  • Paid time off and paid sick leave
  • Medical, dental, and vision benefits
  • 11 paid holidays
  • Disability and basic life insurance
  • Family-forming assistance
  • Mental health program
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service