Senior Site Reliability Engineer

Block

76d•$160,700 - $283,600

About The Position

As a member of the SRE team, you will proactively and reactively improve the reliability of Block's platform and critical infrastructure. You are metrics-driven, systems-oriented, and focused on building distributed platforms that enable safe, scalable product development. You will leverage and continuously improve AI-driven tooling and automation to enhance observability, accelerate incident detection and response, and reduce operational toil. This includes applying AI to incident analysis, alert tuning, and operational workflows. You will participate in primary platform oncall (12 hours per day, one week every few weeks, depending on team size), supporting Block's most critical (Tier 0) services. In this role, you will lead incident command, coordinate mitigation, and drive effective escalation during high-severity events. This program shifts Block from reactive incident handling to repeatable, system-wide reliability gains — fewer customer-visible incidents, faster response, higher product velocity, and lower burnout across the organization. We're working to build a more inclusive economy where our customers have equal access to opportunity, and we strive to live by these same values in building our workplace. Block is a proud equal opportunity employer. We work hard to evaluate all employees and job applicants consistently, based solely on the core competencies required of the role at hand, and without regard to any legally protected class. We believe in being fair, and are committed to an inclusive interview experience, including providing reasonable accommodations to disabled applicants throughout the recruitment process. We encourage applicants to share any needed accommodations with their recruiter, who will treat these requests as confidentially as possible. Want to learn more about what we're doing to build an inclusive workplace? Check out our Inclusion & Diversity page

Requirements

Drive to root cause systems with many moving parts and take the necessary steps to fix them
Demonstrated technical initiative and leadership on previous projects, especially those with a backend/platform focus
Familiarity with AI-driven tooling for observability, incident analysis, or automation
A mindset that naturally reaches for AI to accelerate problem-solving and reduce toil
Experience running production oncall for high-availability systems
Strong incident management skills — structured triage, mitigation under pressure, blameless postmortems
Fluency with CI/CD pipelines, progressive rollout strategies, and rollback automation
Monitoring & observability expertise — building/tuning alerts for uptime, error rates, latency regression, and resource exhaustion
Ability to create and maintain evidence-based maturity assessments using trailing 90-day data windows.
Comfort with vendor/dependency management — maintaining validated escalation contacts reachable within ≤ 5 minutes.
Boundless curiosity, autonomy, and a strong sense of accountability
A strong desire to perform and grow as an engineer
5+ years of software development experience

Responsibilities

Build and extend platforms to improve system reliability
Work on team goals that encompass reliability for the entire company
Standardize reliability tools across multiple platforms and organizations
Triage, coordinate, and lead stabilization of sev 0–1 incidents
Serve as primary oncall, maintaining structured escalation paths and exercising leadership escalation
Drive platform-wide reliability improvements, shared operational tooling, and deploy-safety patterns
Use AI-driven systems to improve signal detection, reduce noise, and accelerate root cause analysis
Design and implement safe deployment patterns (progressive delivery, automated rollback, guardrails)

Benefits

Healthcare coverage (Medical, Vision and Dental insurance)
Health Savings Account and Flexible Spending Account
Retirement Plans including company match
Employee Stock Purchase Program
Wellness programs, including access to mental health, 1:1 financial planners, and a monthly wellness allowance
Paid parental and caregiving leave
Paid time off (including 12 paid holidays)
Paid sick leave (1 hour per 26 hours worked (max 80 hours per calendar year to the extent legally permissible) for non-exempt employees and covered by our Flexible Time Off policy for exempt employees)
Learning and Development resources
Paid Life insurance, AD&D, and disability benefits