Founding ML infrastructure Engineer

uRunUnited States, CA
$200,000 - $350,000

About The Position

uRun is building the next generation of AI inference infrastructure, focusing on the compute layer that makes real-time, stateful inference possible at scale. As a founding ML Infrastructure and Platform Engineer, you will own the architecture and scaling of our GPU compute platform from the ground up. This is a founding technical hire with end-to-end ownership across the full infrastructure stack, from bare metal to model serving. You will work directly with the founding team and define how we build.

Requirements

  • Proven experience designing and operating large-scale distributed infrastructure at 1,000+ nodes or equivalent complexity, in any domain
  • Deep expertise in distributed systems, cluster orchestration (Kubernetes, Slurm, or custom schedulers), and large-scale resource scheduling
  • Strong production reliability instincts: observability, incident response, capacity planning, and SLA ownership across complex systems
  • Experience building infrastructure that other engineers build on top of, not just operating it
  • Ability to operate as a technical lead: set direction, make tradeoffs under uncertainty, and raise the bar for the team around you
  • Startup orientation. You are energised by ambiguity, move fast, and build for scale from day one

Nice To Haves

  • Exposure to ML infrastructure concepts: GPU networking (NCCL, InfiniBand, RoCE), model serving frameworks (vLLM, SGLang, TensorRT-LLM), or hardware-aware performance tuning (CuTe, Triton, TileLang)
  • Experience with multi-cloud GPU procurement and capacity management across AWS, GCP, Azure, and bare metal providers
  • Familiarity with inference marketplace architectures, dynamic routing, or spot/preemptible workload management
  • Prior experience at a Series A or earlier stage company scaling from early infrastructure to production

Responsibilities

  • Design and scale our GPU compute platform to support 1,000+ GPU clusters, ensuring high availability and low-latency inference across the fleet
  • Build and maintain the infrastructure layer for our compute marketplace, including multi-tenant scheduling, isolation, and billing-aware resource allocation
  • Own production reliability for ML systems end-to-end: observability, incident response, and SLA achievement across model serving and infrastructure
  • Architect feature stores and model registry systems that support rapid iteration and reproducibility at scale
  • Design an experiment tracking infrastructure capable of handling thousands of concurrent runs with full auditability
  • Build resource orchestration and scheduling systems that optimise for throughput, cost, and latency across heterogeneous hardware
  • Set engineering standards for infrastructure reliability, capacity planning, and operational excellence as an early technical leader

Benefits

  • Competitive salary and meaningful equity
  • Health, dental, and vision — full coverage
  • 401(k) — company-supported retirement savings
  • FSA/HSA — flexible spending accounts for healthcare costs
  • Paid time off — we trust you to manage your time
  • Top-tier tooling — access to the best AI tools available: Claude, Codex, Kimi, and whatever else helps you move faster
  • MacBook Pro and AirPods — the hardware you need, on us
© 2026 Teal Labs, Inc
Privacy PolicyTerms of Service