About The Position

We are looking for a Distributed Systems Engineer to own the infrastructure powering our core Siri Agentic Evaluation Platform. Evaluation is no longer just a static test suite—it is a highly dynamic, massive-scale distributed problem. Our platform enables teams to run high-throughput agentic simulations, orchestrate multi-model judging pipelines, and generate real-time observability dashboards across billions of tokens and complex data types. In this role, you will design the execution engine that coordinates these complex evaluation loops. You will build systems that remain deterministic, fault-tolerant, and cost-efficient, even when coordinating massive parallel requests across heterogeneous device types(iPhones, Mac, iPads etc).

Requirements

  • MS in computer science or equivalent
  • 7+ years of experience as distributed systems engineer, platform engineer or equivalent
  • Strong proficiency in languages optimized for concurrency and enterprise scale, such as Python (asyncio) or Java
  • Deep expertise in designing robust, versioned production APIs using gRPC/Protobuf, GraphQL, or REST (FastAPI)

Nice To Haves

  • Strong experience modeling complex relational data and trace hierarchies using PostgreSQL, combined with high-throughput analytical query layers.
  • Experience designing asynchronous, event-driven architectures using Kafka, AWS SQS/SNS, RabbitMQ, or Redis Streams.
  • Advanced experience with Kubernetes (orchestration, custom operators, service meshes like Istio or Linkerd) and cloud providers (AWS, GCP, or Azure).
  • Experience building Agentic RAG platforms or developer-facing infrastructure tooling.
  • Proficiency with Terraform to manage infrastructure declaratively.
  • Experience building automated, containerized deployment pipelines (GitHub or ArgoCD) with an emphasis on keeping developer feedback loops fast and reliable.

Responsibilities

  • Design the execution engine that coordinates complex evaluation loops.
  • Build systems that remain deterministic, fault-tolerant, and cost-efficient.
  • Coordinate massive parallel requests across heterogeneous device types (iPhones, Mac, iPads etc).
© 2026 Teal Labs, Inc
Privacy PolicyTerms of Service