Senior Software Engineer - Core Team

Userpilot•Austin, TX

About The Position

Userpilot is a leading product analytics and engagement platform. Hundreds of product teams use us to understand, segment, and activate their users in real time. Under the hood, that's a distributed Elixir/Phoenix backend sustaining hundreds of thousands of concurrent WebSocket connections, high-throughput Kafka event ingestion, ClickHouse analytics at scale, and always-on content delivery. We move fast, we ship often, and we believe the best engineers care as much about how the whole system holds together as about the feature in front of them. This is the most senior individual-contributor engineering role at Userpilot, and it is a different kind of role. Core Team engineers are the closest thing we have to software architects. They don't own a single feature area; they own how the system fits together, how it behaves under load, and how it recovers when something breaks. They are a rare breed: equally at home in a Terraform module, an application lifecycle, a high volume database query plan, and an architecture review. They set the technical direction the rest of engineering builds on, they are the first responders when production is on fire, and they design the guardrails that stop a class of problem from ever happening twice. Application squads move fast on features precisely because the Core Team keeps the ground underneath them solid. And they do all of this in an AI-native way. Coding agents extend their reach across the stack, but the judgment about what is safe, what will scale, and what must never break stays with them.

Requirements

Senior experience designing and operating distributed systems in production, with a track record of being the person who owns how the whole system fits together.
Strong software-engineering and CS fundamentals (data structures, algorithms, system design). You can go deep in application and backend code, not just infrastructure.
Architectural judgment: you reason explicitly about durability, extensibility, robustness, observability, and scalability and the tradeoffs between them, and can write an ADR others can follow.
Distributed-systems instincts: you can break down a complex system to find its failure modes, bottlenecks, and the one change that actually moves the needle.
Calm, methodical incident response: you root-cause under pressure and instinctively turn an incident into prevention.
Hands-on infrastructure: AWS (EKS, EC2, S3, RDS) and the networking that connects them, production Kubernetes and Docker (operating clusters, not just deploying to them), and solid Terraform / Infrastructure as Code.
Observability in practice: Grafana, Prometheus, CloudWatch, and alerting that signals real problems.
Strong communication and influence: this role touches every team, and you drive adoption of patterns across people who don't report to you.
An AI-native workflow: you use AI coding agents (Claude Code, Cursor) as a real part of how you work, and you have a point of view on how to review and trust their output.

Nice To Haves

Elixir, Erlang, or BEAM systems (our backend runs on them) and OTP patterns: supervision trees, GenServers, distribution.
Scaling highly available distributed systems in a fast-moving product environment.
Kafka, RabbitMQ, ClickHouse, Broadway, or similar high-throughput data tooling (we use both brokers).
Building and operating CI/CD that supports high-frequency deployments.
Cloud cost optimization through caching, right-sizing, or more efficient data processing.
Experience as a tech lead, staff engineer, or architect setting direction for an engineering org.
A point of view on the trust model for automated and agent-generated change: automated PRs, agent-triggered deploys, and the gates that make them safe.
Interest in AI-powered observability: anomaly detection, automated runbook execution, or self-healing infrastructure.
Writing technical context documentation (runbooks, ADRs, AGENTS.md-style files) that makes systems understandable to the people and agents joining them.

Responsibilities

Lead system design for cross-cutting and high-risk work, and write and shepherd ADRs the org actually follows.
Partner with application squads to turn product requirements into designs that hold up under load and over time, then get out of their way.
Own production reliability: monitoring, alerting, and on-call practices that surface real problems without drowning the team in noise (Grafana, Prometheus, CloudWatch).
Be first-in on incidents: run the diagnosis, coordinate the fix, write the postmortem, and ship the change that prevents a recurrence.
Design, provision, and operate infrastructure on AWS with Terraform and Kubernetes, with high availability and cost both in mind.
Build and improve CI/CD pipelines and validation gates that make every change trustworthy, whether a human or an agent wrote it.
Write the technical context (ADRs, runbooks, AGENTS.md) that makes the system understandable to new engineers and safe for AI tools.
Keep an eye on infrastructure cost and find the optimizations that actually matter.
Provide technical direction and mentorship across the engineering org.

Benefits

We do not discriminate on the basis of race, color, religion, sex, sexual orientation, gender identity, national origin, age, disability, veteran status, or any other characteristic protected by applicable law. All qualified applicants will receive consideration for employment.

Stand Out From the Crowd

Upload your resume and get instant feedback on how well it matches this job.

Upload and Match Resume