Staff Backend Engineer - Distributed Systems

Archetype AISan Mateo, CA
12d

About The Position

As a Staff Backend Engineer, you will lead the design and scaling of the core backend systems that power our AI platform. You will collaborate closely with ML researchers, product teams, and other engineers to bring cutting-edge AI models into production at scale, ensuring performance, reliability, and operational excellence. This role goes beyond coding: you will own complex systems end-to-end, influence architectural decisions, drive technical strategy, mentor other engineers, and elevate the overall engineering culture.

Requirements

  • 7+ years of professional software engineering experience, with a focus on backend or distributed systems.
  • Deep understanding of distributed systems fundamentals—concurrency, consistency, replication, fault tolerance, networking.
  • Experience building and operating production-grade systems at scale in cloud environments (e.g., Azure, AWS, GCP).
  • Strong debugging, instrumentation, and observability skills across distributed systems.
  • Demonstrated ownership of complex technical problems and ability to learn and adapt quickly.

Nice To Haves

  • 7+ years of professional software engineering experience, with deep expertise in backend or distributed systems.
  • Strong understanding of distributed systems fundamentals: concurrency, consistency, replication, fault tolerance, and networking.
  • Experience building and operating production-grade systems at scale in cloud environments (AWS, GCP, Azure).
  • Advanced debugging, instrumentation, and observability skills across complex distributed systems.
  • Proven ownership of complex technical problems and ability to drive them to completion.
  • Experience mentoring engineers and influencing architectural decisions across teams.

Responsibilities

  • Lead the architecture, design, and implementation of distributed systems supporting high-throughput, low-latency AI model inference and data services.
  • Collaborate with ML researchers and product teams to transition experimental models into production-grade systems.
  • Define technical strategy and best practices for backend systems, including GPU clusters, cloud infrastructure, and distributed data pipelines.
  • Drive performance optimization, reliability, and operational excellence across large-scale systems.
  • Build internal tools, monitoring, and observability frameworks to proactively detect and resolve issues.
  • Introduce innovative architectures, techniques, and automation to maximize scalability, efficiency, and reliability.
  • Mentor engineers, lead by example, and foster a culture of engineering excellence, knowledge sharing, and collaboration.
  • Balance rapid iteration on early-stage systems with long-term architectural soundness and maintainability.
  • Take ownership of end-to-end problem solving—from design through deployment—ensuring high quality and robust delivery..
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service