The roadmap isn't handed to you here. You'll help write it — and you'll be the reason it stays up. As a Staff Software Engineer focused on Infrastructure at Wisdom, you'll set the technical direction for reliability across the company — and own the systems behind the systems: the deploy pipeline, the observability, the capacity controls, and the failure-handling that decide whether our agentic billing infrastructure quietly does its job or pages someone at 2am. This is a force-multiplier role on a small, high-trust team. Your job isn't just to fix what breaks; it's to make the whole organization operate at a higher reliability bar — to build the practices, the guardrails, and the instincts that mean fewer things break in the first place, and the team can handle the ones that do without you in the room. Wisdom's stack is TypeScript, Node.js, React, Postgres, and AWS, with LLM-driven agents (Mastra, Anthropic) making high-stakes billing decisions in production. The problems we're solving — keeping inconsistent insurance integrations alive, making AI pipelines fail safe instead of failing loud, running HIPAA-compliant infrastructure that genuinely can't go down — are legitimately hard. We'd rather have someone energized by making things not break than someone who merely tolerates being paged when they do. In your first year, you'll have defined what reliability means at Wisdom and built the function to deliver it: a real observability and SLO practice, an incident process that runs without heroics, agentic pipelines that degrade gracefully instead of taking prod down with them, and a team that's measurably better at operating production because of how you've raised the bar. This is a fully remote role reporting directly to the Head of Engineering.
Stand Out From the Crowd
Upload your resume and get instant feedback on how well it matches this job.
Job Type
Full-time
Career Level
Senior
Education Level
No Education Listed