Staff Software Engineer

Fetch

3h•Remote

About The Position

Fetch’s Core Services team is building the next generation of support experiences, powered by an LLM, grounded in trusted data, and designed with safety and accountability from day one. This work is user-facing and directly shapes how quickly and accurately customers get help, and how confidently agents can resolve issues. We’re hiring a Staff Backend Engineer to design and evolve the systems behind an LLM-enabled support toolchain. You’ll build the backend architecture that allows a chatbot or LLM to make personalized, data-driven determinations about ticket type and recommended resolutions, while ensuring all actions, such as awarding points, are initially routed to human agents for review and approval. Recommendations will be evaluated for accuracy, monitored over time, and progressively unlocked so that in a later phase, ticket types that consistently meet accuracy and safety thresholds can be resolved end to end through automation with strong safeguards and auditability. This is a high-impact role at the intersection of backend systems, data, and applied AI, where reliability, observability, and responsible automation are non-negotiable. This is a full-time role that can be held from one of our US offices or remotely in the United States.

Requirements

8+ years of experience designing, building, and operating backend systems that support critical, user-facing workflows at scale.
Demonstrated ownership of backend architecture for complex, multi-service systems, including long-term evolution, failure modes, and operational maturity.
Deep expertise in API design and service boundaries for decisioning systems that ingest signals, enrich context, and produce deterministic, explainable outcomes.
Proven experience building secure data access layers that aggregate sensitive customer data with strict authorization, PII minimization, and auditable access patterns.
Strong track record of designing systems with explicit human-in-the-loop controls, approvals, and escalation paths for high-impact actions.
Hands-on experience instrumenting production systems with end-to-end observability, including traceability of decisions, accuracy signals, and error analysis.
Ability to independently translate ambiguous product and policy requirements into robust backend designs with clear risk tradeoffs.
Demonstrated influence beyond direct ownership, including mentoring senior engineers and setting engineering standards for reliability, safety, and maintainability.

Nice To Haves

Direct experience building backend platforms that integrate LLMs or ML models into real-time decisioning or recommendation workflows.
Experience defining and operationalizing accuracy, confidence scoring, and acceptance thresholds for automated or semi-automated systems.
Background in building systems that transition from human-reviewed actions to gated automation with explicit risk controls and rollback mechanisms.
Experience in domains where correctness, explainability, and auditability are mandatory, such as support tooling, trust & safety, fraud, payments, or compliance.
Familiarity with retrieval patterns that safely provide contextual data to models, including RAG, tool invocation, or structured prompting pipelines.
Experience designing systems that support offline evaluation, shadow mode execution, and controlled production experiments.
Prior Staff- or Principal-level experience driving architectural direction across teams or platforms.

Responsibilities

Design and scale core services for LLM-driven support tooling by building modular systems that classify customer issues, recommend ticket types, and propose resolution paths using real-time, trusted data.
Build user-facing and agent-facing APIs for support decisioning by developing well-structured APIs that support consistent ticket intake, enrichment, routing, and recommendation outputs across automated chat surfaces and internal agent tools.
Implement a secure customer context retrieval layer by building services that assemble only the necessary contextual data with strict access controls, PII minimization, and auditing, enabling safe, data-driven personalization.
Build human-in-the-loop workflows for action approval by designing mechanisms that route LLM recommendations to agents for review with clear rationale, supporting evidence, and suggested responses, ensuring humans remain the final approvers in early phases.
Develop evaluation, monitoring, and observability for LLM-powered support by instrumenting tracing, structured logs, and metrics to monitor end-to-end flows including latency, success rate, recommendation acceptance rate, accuracy, escalation rate, and error modes, and by enabling offline evaluation and controlled online experiments.
Enable phased automation with confidence gates and risk controls by building a path to no-human-in-the-loop execution for specific ticket types only after defined accuracy and safety thresholds are met, with safeguards including confidence scoring, tiered permissions, rate limiting, anomaly detection, kill switches, and comprehensive audit trails.
Partner across Support, Product, Data Science/ML, and Risk/Compliance to define what “accuracy” and “safe automation” mean, establish review processes, align on policies, and ensure the system meets operational and regulatory requirements.
Raise the bar on backend architecture, reliability, privacy and security, and responsible automation patterns, helping the team build systems that are both fast to iterate and safe to deploy at scale.

Benefits

Equity: We offer employees equity in Fetch, so that everyone can benefit from Fetch’s growth.
401k Match: Dollar-for-dollar match up to 4%.
Benefits for humans and pets: We offer comprehensive medical, dental and vision plans for everyone including your pets.
Continuing Education: Fetch provides ten thousand per year in education reimbursement.
Employee Resource Groups: Take part in employee-led groups that are centered around fostering a diverse and inclusive workplace through events, dialogue and advocacy. The ERGs participate in our Inclusion Council with members of executive leadership.
Paid Time Off: On top of our flexible PTO, Fetch observes 9 paid holidays, including Juneteenth and Indigenous People’s Day, as well as our year-end week-long break.
Robust Leave Policies: 20 weeks of paid parental leave for primary caregivers, 14 weeks for secondary caregivers, and a flexible return to work schedule.
Calvin Care Cash: Employees who are welcoming new family members will also receive a one time $2,000 incentive to assist employees with covering the cost of childcare, clothing, diapers and much more!
Flexible Work Environment: Collaborate with your team in one of our stunning offices in Madison, Birmingham, or Chicago. Or you can work fully remotely from anywhere in the US. We’ll ensure you are equally equipped with the hardware and software you need to get your job done in the comfort of your home.

Stand Out From the Crowd

Upload your resume and get instant feedback on how well it matches this job.

Upload and Match Resume