Staff Backend Engineer, Gatekeeper

Helius

50d•Remote

About The Position

We're looking for a Staff Backend Engineer to own and evolve Gatekeeper, Helius’s high-performance edge gateway and middleware layer. Gatekeeper is the single entry point for JSON-RPC, WebSockets, and Helius APIs, and it exists to make latency and reliability feel comparable to running a dedicated node, globally. In this role, you will lead architectural decisions across routing, connection management, backpressure, and observability. You will work closely with internal service teams to improve end-to-end performance and failure handling, and to make Gatekeeper safer to operate at high scale.

Requirements

Significant experience building and operating high-throughput backend systems in production (proxies, gateways, distributed services, or infra-heavy platforms).
Deep understanding of networking fundamentals and HTTP behavior (TLS, TCP, connection reuse, proxies, load balancers, timeouts).
Strong performance engineering skillset: profiling, benchmarking, and making latency/throughput tradeoffs with rigor.
Track record of leading ambiguous, cross-team projects and shipping durable systems.
Operational excellence: you have owned services with real on-call responsibility, and you make them easier to run over time.
Excellent communication: you can write clear design docs, align stakeholders, and make decisions legible.

Nice To Haves

Rust experience (or strong interest in working close to the metal for performance-critical systems).
Experience with anycast, multi-region traffic management, or edge deployments.
Familiarity with WebSockets at scale and the operational challenges that come with long-lived connections.
Experience building internal platforms that standardize observability, incident response, and service reliability.
Interest in Solana / crypto infrastructure, market data, or latency-sensitive trading systems.

Responsibilities

Lead the technical direction for Gatekeeper as the unified entry point for Helius traffic, with an emphasis on p50/p99 latency and tail reliability.
Design and implement routing and load balancing strategies across regions and backend pools, including failover behavior and graceful degradation.
Improve connection handling end-to-end: TLS termination, keepalives, pooling, timeouts, backpressure, and request/response streaming behavior.
Build robust, operator-friendly observability: SLOs, dashboards, alerts, and “is it healthy?” views that make issues diagnosable fast.
Partner with internal service teams to define and enforce contracts (timeouts, retries, error mapping, capacity signals), and reduce systemic failure modes.
Drive hardening work across security and abuse controls (auth failure behavior, rate limiting / caps enforcement, request validation).
Own production operations for Gatekeeper: incident response, on-call improvements, runbooks, and post-incident follow-through.
Mentor engineers and raise the bar on performance engineering, operational rigor, and code quality.