Founding Engineer - Platform

uRunUnited States, CA
$250,000 - $350,000Remote

About The Position

The problem we saw AI inference today is slow, expensive, and stateless. Send a query, wait, get a response, reset. That's fine for batch — but AI is becoming interactive, and interactive means inference has to respond instantly, hold context across a session, and be steerable in real time. Nobody had built an infrastructure that does all three at once. The bottleneck isn't the models. It's the runtime underneath them. What we're building to fix it uRun — Universal Runtime is the layer that makes real-time, stateful inference possible. Our platform lets AI respond instantly, hold context across a session, and be directed as it runs. We prove it through the hardest problem in the stack: real-time AI video generation. Not pre-rendered clips. Not queued jobs. Live, steerable, continuous video that responds as you speak. Solve that, and the rest of the inference stack follows, and that's what we've done. We're an infrastructure company; we build the layer model labs, builders, and research teams ship on top of.

Requirements

  • 7+ years as an engineer, with a proven track record architecting and owning large-scale production systems
  • Deep Kubernetes expertise, including GPU-heavy clusters (NVIDIA tooling, autoscaling on GPU nodes) and service-mesh patterns
  • Strong cloud and infrastructure-as-code: AWS, GCP, or Azure; Terraform, Pulumi, or equivalent; networking and security (VPC, IAM, API-gateway-style routing)
  • SRE-style thinking and observability depth: Prometheus/Grafana, OpenTelemetry, distributed tracing, SLOs, incident response, and post-mortems
  • Proficiency in at least one of Python, Go, or TypeScript/Node.js for platform tooling, automation, and glue code
  • Experience with streaming or real-time systems: WebRTC, low-latency video pipelines, or comparable latency-sensitive workloads. This is central to the role, not a bonus
  • A track record of mentoring engineers and influencing cross-functional teams

Nice To Haves

  • Hands-on experience with GPU-constrained, memory-bound, or bursty workloads
  • Experience writing custom Kubernetes controllers, scaling logic, or other platform features in-house
  • Early-stage startup experience: owning ambiguous problems end-to-end and setting technical direction with limited scaffolding

Responsibilities

  • Design, operate, and evolve the cloud-native platform that runs uRun's real-time inference and video runtime, Kubernetes, GPU-heavy workloads, and streaming pipelines
  • Own observability, reliability, and performance at scale: SLO-driven capacity, autoscaling, failover, and cost-efficient GPU provisioning
  • Build and maintain the platform primitives that product and ML teams depend on, service meshes, deployment pipelines, secrets and credential management, and configuration-as-code
  • Partner closely with ML and video-workload engineers to optimise for low-latency inference, memory-bound workloads, and streaming data flows
  • Define and champion platform standards for security, observability, and incident response, drawing on SRE-style practices
  • Mentor and unblock other engineers, and act as a technical leader on architecture, trade-offs, and long-term platform evolution

Benefits

  • Competitive salary and meaningful equity in an early-stage AI infrastructure company.
  • Health, dental, and vision — full coverage
  • 401(k) — company-supported retirement savings
  • FSA/HSA — flexible spending accounts for healthcare costs
  • Paid time off — we trust you to manage your time
  • Top-tier tooling — access to the best AI tools available: Claude, Codex, Kimi, and whatever else helps you move faster
  • MacBook Pro and AirPods — the hardware you need, on us
© 2026 Teal Labs, Inc
Privacy PolicyTerms of Service