About The Position

At LangChain, our mission is to make intelligent agents ubiquitous. We provide the agent engineering platform and open source frameworks developers need to ship reliable agents fast. Our open source frameworks, LangChain and LangGraph, see over 90+ million downloads per month and help developers build agents with speed and granular control. LangSmith offers observability, evaluation, and deployment for rapid iteration, enabling teams to transform LLM systems into dependable production experiences. LangChain is trusted by millions of developers worldwide and powers AI teams at companies like Replit, Clay, Cloudflare, Harvey, Rippling, Vanta, Workday, and more. About the role In person 5 days/week in San Francisco, CA or New York, NY We're building purpose-built infrastructure for running AI agents. Unlike traditional web apps, agents run for long durations, collaborate asynchronously with humans and other agents, and need to survive failures mid-execution. LangSmith Deployments is the runtime that makes this work, with durable checkpointing, fault-tolerant orchestration, and horizontal scaling, deployed across cloud and self-hosted environments. We're looking for a Senior Backend Engineer to work on this system. While the focus is on backend development, strong familiarity with Kubernetes (K8s), Terraform (Tf), and other DevOps tooling is highly preferred.

Requirements

  • 4+ years of professional backend engineering experience
  • Strong proficiency in Go and/or python
  • Experience with distributed systems — conensus mechanisms, queueing, state machines, and/or workflow orchestration
  • Experience with scaling and sharding databases in high throughput environments
  • Familiarity with Kubernetes, infrastructure-as-code, and at least one major cloud platform
  • Strong communication skills and ability to work cross-functionally on a small team

Responsibilities

  • Design distributed queue and worker systems that handle concurrent agent execution, background tasks, and multi-agent coordination across horizontally scalable infrastructure
  • Own core data infrastructure — state persistence, atomic job claiming, connection management, and schema evolution
  • Collaborate on architectural decisions, ensuring solutions are scalable and robust.
  • Ship resumable streaming infrastructure so clients can disconnect and reconnect mid-execution without losing state
  • Instrument and monitor production systems — tracing, metrics, and alerting to keep the platform healthy
  • Participate in on-call rotations and own incident response for the runtime
  • Create and maintain technical documentation, including system design and operational runbooks.
  • Contribute to and extend open-source LangGraph, which is used by thousands of developers to build agent applications

Benefits

  • We offer competitive compensation that includes base salary, meaningful equity, and benefits such as health and dental coverage, flexible vacation, a 401(k) plan, and life insurance.
  • Actual compensation will vary based on role, level, and location.
  • For team members in the EU and UK, we provide locally competitive benefits aligned with regional norms and regulations.
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service