Senior Platform Reliability Engineer

Grow TherapySan Francisco, CA
$182,000 - $250,000Hybrid

About The Position

Grow Therapy is seeking a Senior Platform Reliability Engineer to help define and scale reliability as a first-class capability. This role operates horizontally across the organization, shaping how reliability is understood, measured, and built into the developer experience. The engineer will work closely with the platform team and product engineering teams to establish standards for observability, SLOs/SLAs, and incident response, translating these into self-service tooling and "golden paths." This is a high-impact, highly autonomous role focused on driving both cultural and technical change to enable teams to independently build and operate reliable systems at scale.

Requirements

  • 6+ years of experience operating and improving reliability of production systems at scale.
  • Hands-on experience with AWS, Kubernetes (e.g., EKS), and infrastructure as code tools like Terraform.
  • Defined or worked with SLOs/SLAs, understand error budgets, and have experience improving reliability through measurement and iteration.
  • Worked with modern observability tooling (e.g., DataDog) and understand how to build actionable monitoring systems across metrics, logs, and traces.
  • Ability to zoom out, identify patterns across teams and services, and design solutions that scale beyond a single system.
  • Focus on outcomes over output and care deeply about improving real reliability outcomes—not just adding processes.
  • Ability to drive change across teams without direct authority, balancing pragmatism with long-term vision.
  • Thrive in ambiguous environments and are comfortable defining problems, proposing solutions, and executing independently.
  • Collaborate well, communicate with empathy, and enjoy mentoring and learning from others.

Nice To Haves

  • Helped introduce or scale reliability practices in a growing organization.
  • Built internal tooling or platforms used by multiple teams.
  • Experience designing service-level scorecards or compliance/reporting systems.
  • Worked with both SaaS (e.g., DataDog) and self-managed observability stacks.
  • Previously a product engineer and bring empathy for developer experience.
  • Experience with database reliability and performance (e.g., PostgreSQL).

Responsibilities

  • Defining Reliability Standards: Establishing frameworks for SLOs/SLAs, error budgets, and operational readiness; helping teams understand what to measure and why it matters.
  • Improving Observability & Measurement: Identifying gaps in metrics, logging, and tracing; ensuring services are measurable, debuggable, and aligned with reliability goals.
  • Evolving Incident Response: Developing and improving incident response practices, from detection to post-incident learning, and helping teams build sustainable on-call and escalation patterns.
  • Enabling Self-Service Reliability: Partnering with the platform team to build tooling and abstractions (e.g., service scorecards, dashboards, templates, golden paths) that make it easy for teams to adopt and stay compliant with reliability standards.
  • Driving Adoption Across Teams: Working cross-functionally to educate, influence, and guide engineering teams—scaling reliability practices through a combination of clear standards, strong communication, and developer-friendly systems.

Benefits

  • Comprehensive Health Coverage: Medical, dental, and vision insurance, plus life and disability coverage.
  • Parental Leave & Family Support: Up to 18 weeks paid leave and a new child stipend.
  • Financial Wellness: 401(k) program and equity opportunities.
  • Meals & Home Office Support: Stipends for home office setup and ongoing funds for meals, with tailored perks for both remote and in-office employees.
  • Time Off to Recharge: Flexible PTO, 12 paid holidays, and a full winter break week.
  • Wellness & Development: Annual stipends to put towards personal & professional growth.
  • Mental & Physical Health Support: No-cost access to therapy through the Grow platform, weekly flexible hours for self-care (“Mental Health Mornings/Afternoons”) and memberships to leading wellness apps (such as One Medical, Headspace, and Talkspace).
  • Extra Perks: Pet insurance discounts, commuter benefits, and global travel assistance.
© 2026 Teal Labs, Inc
Privacy PolicyTerms of Service