Infra Engineer - API

General Intuition & Medal•New York, NY

4d•$250,000 - $400,000•Onsite

About The Position

General Intuition is a frontier research lab focused on building foundation models for environments requiring deep spatial and temporal reasoning. The company has raised $133M from General Catalyst and Khosla to develop next-generation AI agents, world models, and video understanding models. This role is for an Infra Engineer to own the company's API, transforming research models into a production-ready API that is low-latency, highly available, reliable, and scalable. The engineer will work directly with the founding team and have end-to-end ownership of the API, including client libraries, frame reception and action streaming, request routing to GPUs, session management, Kubernetes cluster deployment, and GPU fleet scaling. This is a generalist infrastructure role requiring expertise in both API development and GPU infrastructure.

Requirements

A track record of personally scaling a high-traffic, low-latency API in production.
Deep Kubernetes experience, including multi-region deployments.
Comfort with SLOs and capacity planning.
Strong ownership instinct, with experience taking systems end-to-end.

Nice To Haves

Experience deploying streaming video or audio inference models.
Experience with low-latency game streaming or video streaming infra.
Experience scaling GPU fleets across providers (GCP, Coreweave, Lambda, etc.).
Experience with frontier model inference (LLMs, world models, multimodal).
Experience with on-device / edge inference (ExecuTorch, Core ML, etc.).

Responsibilities

Own the video streaming protocol, including frame reception from clients and efficient routing to servers.
Own the runtime layer of the API, encompassing stateful request routing, GPU session lifecycle, and inference orchestration.
Scale the Kubernetes footprint across multiple regions and lead new regional deployments.
Own the GPU hosting strategy, scaling from dozens to thousands of GPUs while managing costs and latency.
Drive improvements in latency and throughput for inference.
Partner with product engineering on developer-facing reliability, observability, metering, and billing-grade uptime.