Senior Engineer, Inference Control Plane

DigitalOcean•Seattle, WA

8h•$139,000 - $174,000•Hybrid

About The Position

We are seeking a Senior Engineer to implement and contribute to the design and optimization of our Serverless Inference infrastructure and APIs. In this role, you will tackle the challenges of large-scale AI workloads, focusing on throughput, GPU utilization, and fault tolerance to support next-generation inference needs of AI native enterprises.

Requirements

5+ years of experience building and operating multi-tenant platforms or distributed backend systems
Strong experience operating high-scale distributed services in production environments
Deep understanding of SRE principles, including observability, incident management, reliability engineering, capacity planning, and operational automation
1+ years of hands-on experience with Go / Golang in production systems
1+ years of experience with Kubernetes
Strong understanding of cloud-native architectures, microservices, and distributed systems fundamentals
Experience debugging performance, scalability, and reliability issues in production systems
Observability Proficiency: Experience tracking infrastructure and inference metrics like Time To First Token (TTFT), Time Per Output Token (TPOT), and GPU utilization.

Nice To Haves

AI/ML Framework Knowledge: Understanding of modern LLM serving architectures and familiarity with engines like vLLM or Triton.
Experience with API gateways, traffic routing, or service mesh technologies
Familiarity with LLM serving stacks such as vLLM, TensorRT-LLM, or similar technologies
Experience building systems for inference optimization, rate limiting, routing, or workload orchestration

Responsibilities

Design and build scalable, multi-tenant services that power AI inference and intelligent routing workloads.
Develop and operate high-scale distributed systems with strong reliability, availability, and performance goals.
Strengthen platform resiliency through improved observability, capacity management, automation, and operational tooling.
Partner closely with platform, GPU infrastructure, and product engineering teams to deliver production-grade systems and highly available APIs.
Raise the engineering bar through strong software design, operational discipline, incident management, and continuous improvement practices.
Contribute to architecture decisions around traffic management, service orchestration, reliability, and platform scalability.
Participate in on-call rotations and lead efforts to reduce operator pain, improve service health, and prevent recurring incidents.