Infrastructure Engineer

People Culture Talent•San Francisco, CA

10d•$200,000 - $350,000

About The Position

Arena Intelligence is the open platform for evaluating how AI models perform in the real world. Born out of UC Berkeley’s SkyLab and the team behind Arena.ai, our leaderboards are the industry’s gold standard for AI model evaluation — trusted by researchers, developers, and enterprises shaping the future of AI. We’re building the API-based services for enterprise customers— We want to provide enterprises with the best of the ML products (like Arena Max) but customized to their business. You’ll own core infrastructure that turns our research advantage into an enterprise products. This is a founding role on the developer/enterprise team. You’ll work directly with the founders and early customers to define what we build and how we build it.

Requirements

4+ years of backend engineering experience, with meaningful time spent on distributed systems, infrastructure, or developer-facing platforms.
Strong proficiency in Go and/or Rust, with hands-on experience building high-throughput APIs or proxy/gateway systems.
Experience with LLM provider APIs (OpenAI, Anthropic, Google, etc.) and a working understanding of the challenges: streaming, token management, rate limits, model-specific quirks.
Solid cloud infrastructure skills — you’re comfortable with AWS or GCP, Kubernetes, Terraform, and database systems like Postgres and Redis.
A product-oriented mindset. You think about the developer experience of your APIs, not just the implementation. You ask “why” before “how.”
Comfort with ambiguity. We’re a startup. Scope is fluid, context shifts, and you’ll wear many hats. That should sound exciting, not stressful.

Nice To Haves

Experience building API gateways, proxies, or developer tools (Bifrost, Kong, Envoy, Tyk, or custom).
Background in ML infrastructure, model serving, or evaluation frameworks.
Experience building enterprise-ready features: SSO, RBAC, audit logs, multi-tenancy.
Familiarity with the modern AI infra stack (vLLM, LiteLLM, LangChain, etc.).

Responsibilities

Build API based products from the ground up. Design and implement a low-latency, high-reliability APIs for leaderboards, models, and arenas.
Solve hard streaming problems. Handle SSE/streaming responses across heterogeneous providers, including partial failure recovery, mid-stream fallback, and consistent response normalization.
Ship enterprise-grade infrastructure. Build the systems enterprise customers expect: rate limiting, authentication, usage metering, cost attribution, audit logging, and SOC2 compliance
Build deep observability. Instrument infrastructure with distributed tracing, latency breakdowns, token-level usage tracking, and real-time dashboards so customers (and we) can see exactly what’s happening.
Build AI centered products. Integrate with our core evaluation platform, Arena data, and customer-specific benchmarks. Collaborate with the research team to turn novel ideas into full-featured products
Flex across the stack. Contribute to the backend of our Leaderboards and Evals platforms when needed, helping unify our public and private data architectures.