Founding Engineer - Platform

uRun•United States, CA

5d•$250,000 - $350,000•Remote

About The Position

The problem we saw AI inference today is slow, expensive, and stateless. Send a query, wait, get a response, reset. That's fine for batch — but AI is becoming interactive, and interactive means inference has to respond instantly, hold context across a session, and be steerable in real time. Nobody had built an infrastructure that does all three at once. The bottleneck isn't the models. It's the runtime underneath them. What we're building to fix it uRun — Universal Runtime is the layer that makes real-time, stateful inference possible. Our platform lets AI respond instantly, hold context across a session, and be directed as it runs. We prove it through the hardest problem in the stack: real-time AI video generation. Not pre-rendered clips. Not queued jobs. Live, steerable, continuous video that responds as you speak. Solve that, and the rest of the inference stack follows, and that's what we've done. We're an infrastructure company; we build the layer model labs, builders, and research teams ship on top of.

Requirements

7+ years as an engineer, with a proven track record architecting and owning large-scale production systems
Deep Kubernetes expertise, including GPU-heavy clusters (NVIDIA tooling, autoscaling on GPU nodes) and service-mesh patterns
Strong cloud and infrastructure-as-code: AWS, GCP, or Azure; Terraform, Pulumi, or equivalent; networking and security (VPC, IAM, API-gateway-style routing)
SRE-style thinking and observability depth: Prometheus/Grafana, OpenTelemetry, distributed tracing, SLOs, incident response, and post-mortems
Proficiency in at least one of Python, Go, or TypeScript/Node.js for platform tooling, automation, and glue code
Experience with streaming or real-time systems: WebRTC, low-latency video pipelines, or comparable latency-sensitive workloads. This is central to the role, not a bonus
A track record of mentoring engineers and influencing cross-functional teams

Nice To Haves

Hands-on experience with GPU-constrained, memory-bound, or bursty workloads
Experience writing custom Kubernetes controllers, scaling logic, or other platform features in-house
Early-stage startup experience: owning ambiguous problems end-to-end and setting technical direction with limited scaffolding

Responsibilities

Design, operate, and evolve the cloud-native platform that runs uRun's real-time inference and video runtime, Kubernetes, GPU-heavy workloads, and streaming pipelines
Own observability, reliability, and performance at scale: SLO-driven capacity, autoscaling, failover, and cost-efficient GPU provisioning
Build and maintain the platform primitives that product and ML teams depend on, service meshes, deployment pipelines, secrets and credential management, and configuration-as-code
Partner closely with ML and video-workload engineers to optimise for low-latency inference, memory-bound workloads, and streaming data flows
Define and champion platform standards for security, observability, and incident response, drawing on SRE-style practices
Mentor and unblock other engineers, and act as a technical leader on architecture, trade-offs, and long-term platform evolution

Benefits

Competitive salary and meaningful equity in an early-stage AI infrastructure company.
Health, dental, and vision — full coverage
401(k) — company-supported retirement savings
FSA/HSA — flexible spending accounts for healthcare costs
Paid time off — we trust you to manage your time
Top-tier tooling — access to the best AI tools available: Claude, Codex, Kimi, and whatever else helps you move faster
MacBook Pro and AirPods — the hardware you need, on us

Stand Out From the Crowd

Upload your resume and get instant feedback on how well it matches this job.

Upload and Match Resume