Staff Software Engineer, Platform Reliability

Housecall Pro
10d$136,000 - $170,000Remote

About The Position

Help us build solutions that build better lives. At Housecall Pro, we show up to work every day to make a difference for real people: the home service professionals that support America’s 100 million homes. We’re all about the Pro, and dedicate our days to helping them streamline operations, scale their businesses, and—ultimately—save time so they can be with their families and live well. We care deeply about our customers and foster a culture where our company, employees, and Pros grow and succeed together. Leadership is as focused on growing team members’ careers as they expect their teams to be on creating solutions for Pros. As a Staff Software Engineer on the Site Reliability Engineering (SRE) team at Housecall Pro, you are a software developer first whose mission is to improve the reliability, performance, and resilience of our production systems through better code, better design, and better runtime behavior. This role is deeply hands-on with service code. You regularly read, debug, and reason about feature team implementations to understand how they behave in production—under load, during failures, and over time. You use telemetry (metrics, logs, traces, and database signals) as a lens into code-level behavior, helping teams connect what they see in dashboards and alerts back to the specific design decisions and implementation details that caused them. Operating within a Product SRE model, you partner closely with feature teams to drive engineering change. Rather than “owning production for them,” you help teams own and improve their services by identifying root causes, suggesting concrete code and architectural improvements, and helping implement patterns that lead to more reliable systems. While you contribute to and maintain observability and reliability tooling, that tooling exists to serve a single purpose: enabling engineers to better understand and improve their software. You are comfortable participating in on-call and incident response, but your primary focus is on reducing future incidents through better engineering, not reacting to the same failures repeatedly. This role is ideal for a senior software engineer who enjoys distributed systems, production debugging, and collaborative problem-solving across teams. Founded in 2013, Housecall Pro helps home service professionals (Pros) streamline every aspect of their business. With easy-to-use tools for scheduling, dispatching, payments, and more, Housecall Pro enables Pros to save time, grow profitably, and provide best-in-class service. Housecall Pro’s brand portfolio includes Business Coaching by Housecall Pro, a business coaching solution for home services businesses. Our brands are united by a singular mission to champion our Pros to success. We support more than 40,000 businesses and have over 1,800 ambitious, mission-driven, genuinely fun-loving teammates across the globe. If you want to do work that impacts real people, supported by a team that will invest in you every step of the way, we’d love to hear from you

Requirements

  • 6–9+ years of experience as a Software Engineer, with significant exposure to operating production systems.
  • Strong proficiency in reading, debugging, and improving large backend codebases.
  • Experience building and operating distributed systems or service-oriented architectures.
  • Solid understanding of performance engineering, failure modes, and reliability fundamentals at the code and system level.
  • Hands-on experience with observability tools (metrics, logging, tracing) and using them to diagnose code-level issues.
  • Experience working with relational databases (e.g., MySQL, PostgreSQL), including query optimization and schema design.
  • Strong knowledge of Kubernetes, container orchestration, and cloud-native runtime environments.
  • Experience participating in incident response and production on-call rotations.
  • Strong communication skills and the ability to work collaboratively with feature teams on shared codebases.

Nice To Haves

  • A strong identity as a software engineer who enjoys understanding how systems behave in the real world.
  • Ability to connect high-level production symptoms to low-level implementation details.
  • Comfort giving concrete, actionable feedback on code and architectural design.
  • A curiosity-driven approach to debugging complex systems and uncovering root causes.
  • An SLO-oriented mindset that prioritizes user impact and measurable reliability outcomes.
  • Bias toward durable code and design improvements over configuration-only fixes.
  • Empathy for feature teams and a collaborative approach to driving change.
  • Passion for building systems that are understandable, observable, and resilient under failure.

Responsibilities

  • Dive into service codebases to understand how implementation details, data access patterns, and architectural choices affect production behavior.
  • Use metrics, logs, traces, and database telemetry to trace production issues back to specific code paths, queries, or design decisions.
  • Partner with feature teams to debug complex reliability and performance issues, proposing concrete code changes and architectural improvements.
  • Suggest and help implement improvements such as safer concurrency models, more efficient algorithms, better resource usage, and clearer service boundaries.
  • Help teams adopt resilient coding patterns, including retries with backoff, circuit breakers, bulkheads, idempotency, and graceful degradation.
  • Lead or contribute to post-incident reviews, translating operational failures into actionable engineering improvements.
  • Design and evolve observability tooling that makes it easier for engineers to reason about code-level behavior in production.
  • Review service and database interaction patterns to reduce latency, contention, and unnecessary load.
  • Collaborate on database-related improvements, including schema design, query optimization, migration strategies, and scaling approaches.
  • Contribute to reliability standards such as SLOs, service readiness expectations, and reliability scorecards.
  • Mentor engineers by modeling strong debugging practices, thoughtful system design, and ownership of production software.

Benefits

  • A generous benefits program that supports the whole you with medical, dental, vision, life, disability, and 401(k)
  • Paid holidays and flexible, take-it-as-you-need-it paid time off
  • Equity in a rapidly growing startup backed by top-tier VCs
  • Monthly tech reimbursements
  • A culture built on innovation that values big ideas, no matter where they come from
  • Remote environment: totally built to make you feel that we are all together in one space without leaving your home office!
  • Self Managed PTO: Beach? Mountains? Camping? Discovering new experiences? You are free to take time out as you need!
  • Flexible work hours: We believe that you can reach your professional and personal goals working with us and encourage you to have a work life balance!
  • A culture built on innovation that values big ideas: We are always open to new ideas that will improve the life of our Pros!
  • MacBook (or PC if you prefer!) + Setup Fee ($500): What is remote work without the right tools? Here at HCP, you can choose your computer and set up your home office!
  • health care insurance (medical, dental, vision, disability)
  • employee assistance program
  • 401(K)
  • flexible time off
  • paid parental leave
  • tech reimbursement
  • other company benefits

Stand Out From the Crowd

Upload your resume and get instant feedback on how well it matches this job.

Upload and Match Resume

What This Job Offers

Job Type

Full-time

Career Level

Mid Level

Education Level

No Education Listed

Number of Employees

1,001-5,000 employees

© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service