We build environments for AI agents - systems that measure and improve their productive output. What You'd Work On Agent orchestration at scale. Hundreds of agent runs at once, each with its own stateful environment. 100M tokens per minute across the fleet. You own the dispatch layer: SQS, concurrency control, failure handling. Environment and task design. We need environments that feel real and scenarios that actually push agents to their limits. You'd figure out how to build new evaluations and design the tasks that test what matters, not just what's easy to measure. New frontiers. The agent evaluation space is moving fast. You'd stay on that edge, supporting new environment modalities and shipping integrations with external orchestration frameworks. Observability. Prometheus and OpenTelemetry across services, Grafana dashboards, structured logging. About You Container orchestration. You're comfortable running Kubernetes or similar in production. Auto-scaling, pod lifecycle, persistent storage, networking. You can figure out why something won't schedule and reason about resource contention. Distributed systems. You've built or maintained message-driven architectures. SQS, Kafka, or similar. You know how to keep jobs moving when things back up, retry without duplicating, and fail without losing work. LLM infrastructure. You've run LLM workloads at scale. Token instrumentation, rate limit handling, prompt caching, multi-provider routing. You've built the plumbing between models and external tools, and you know what it takes to keep it all running under load. Experience. No hard rule. Roughly 3-5 years at this level, but more or less works if the above sounds like you. What Makes This Different It's infra, but the workload is AI agents. You're monitoring model behavior alongside pod health, debugging token throughput alongside network throughput. Our customers are AI researchers and labs. You'd work directly with the people pushing the frontier of what agents can do, and build the infrastructure they run it on. Early-stage team. You own whole systems, not tickets in a queue. One week you're shipping a new environment type, the next you're scaling the dispatch layer to handle 10x the throughput.
Stand Out From the Crowd
Upload your resume and get instant feedback on how well it matches this job.
Job Type
Full-time
Career Level
Mid Level
Education Level
No Education Listed