Staff Quality & Reliability Engineer

MrBeast•San Francisco, CA

1d•Hybrid

About The Position

We're doing an AI-first engineering rebuild for a company that already has an audience of 100M+ people. This is a zero-to-one build with no legacy constraints, which means you get to set the quality and reliability bar from day one instead of inheriting a decade of flaky tests and silent outages. You're here to make sure we can move fast without lighting production on fire. You'll own how Beast Industries ships software that's both correct and resilient at scale, spanning quality engineering and site reliability across consumer-facing platforms including Step and the creator ecosystem. This is a hands-on expert role, not a people-management one. You set the standards and build the systems other teams rely on.

Requirements

AI-Native: You're already using AI daily, and you have a real point of view on where AI-assisted testing and anomaly detection help versus where they just add noise.
Quality + Reliability Hybrid: Extensive hands-on experience across both software quality engineering and site reliability, with test-automation architecture and reliability systems built for high-traffic distributed production.
Production Owner: You've defined SLO/error-budget frameworks and led incident response for severe production events, and you treat every escaped defect as a systemic problem, not an individual fault.
Builder Who Influences: You're a strong enough engineer to build the tooling and review systems-level code, and you move teams through working systems and evidence rather than mandates.
Deep fluency with observability stacks (metrics, logging, distributed tracing), CI/CD pipelines, and cloud infrastructure.

Nice To Haves

Consumer-scale fintech or high-volume media/streaming environments
Chaos engineering
Contributions to open-source reliability/testing tooling

Responsibilities

Own the test strategy across unit, integration, end-to-end, performance, and chaos/resilience testing.
Set SLOs, error budgets, and reliability standards for critical services, and drive product teams to adopt them.
Build the foundational tooling: CI/CD test gates, regression suites, load-testing harnesses, and observability instrumentation.
Define and own the org's test strategy and the release-readiness criteria for high-risk launches.
Establish SLO and error-budget frameworks for critical services and make them stick across teams.
Lead incident response for high-severity events, run blameless postmortems, and own the follow-through.
Find the systemic sources of fragility everyone else is treating as one-off incidents, and drive root-cause fixes.
Be the technical authority on go/no-go calls, and make the risk legible to non-technical stakeholders.
Own the build-vs-buy decisions on monitoring, tracing, alerting, and test-automation platforms.
Mentor engineers on reliability thinking without becoming the single point of failure yourself.

Benefits

Highly competitive equity package designed for a foundational hire.
Competitive Salary
Generous Medical (Blue Cross Blue Shield), Dental, Vision and company-paid Life Insurance
Company contributions to employee Health Savings Accounts (HSA)
401k Plan with Safe Harbor company-matching
Flexible vacation policy and paid company holidays
Company-provided technology package
Relocation assistance where applicable, including travel and company-provided housing for the first 90 days

Stand Out From the Crowd

Upload your resume and get instant feedback on how well it matches this job.

Upload and Match Resume