Senior Software Engineer, Machine Learning Infrastructure - Generative AI

DoorDash USASunnyvale, CA
$137,100 - $299,300

About The Position

You will join a small, high-leverage team building production infrastructure for Generative AI at DoorDash, leading the design and architecture of our open-weights model platform spanning inference and fine-tuning: real-time GPU serving, high-throughput batch inference, and model fine-tuning. You’ll set technical direction across model serving and inference engines, fine-tuning and training pipelines, GPU autoscaling and utilization, batch pipelines, backend services, and observability, and mentor engineers as you go. This role is ideal for a senior engineer who enjoys owning ambiguous, high-impact systems and pushing the cost/performance frontier of GPU inference and fine-tuning in a fast-moving technical area where product needs, model capabilities, vendor ecosystems, and cost/performance tradeoffs are evolving quickly.

Requirements

  • B.S., M.S., or PhD. in Computer Science or equivalent
  • 6+ years of industry experience in software engineering
  • Deep backend engineering fundamentals, especially in Python and distributed systems.
  • Track record of designing and owning production services, APIs, data pipelines, or ML infrastructure at scale.
  • Experience operating systems in production, including observability, debugging, reliability, incident response, and performance/cost optimization.
  • Deep hands-on experience with LLM inference and/or fine-tuning of open-weight models in production — serving (latency, throughput, batching, autoscaling, GPU utilization) and/or fine-tuning (SFT/DPO/LoRA).
  • Demonstrated technical leadership: leading design across ambiguous, fast-moving technical areas, mentoring engineers, and turning customer use cases into reusable platform capabilities
  • Proficiency in using AI coding tools (e.g., Claude Code, Codex, Cursor) in the full software development lifecycle, including designing, generating code, testing, monitoring and releasing software

Nice To Haves

  • Experience with LLM inference engines and serving frameworks (e.g., vLLM, SGLang, TensorRT-LLM) in production
  • Experience with distributed/multi-node fine-tuning and training pipelines (SFT, DPO/RLHF, LoRA), including data preparation and evaluation
  • GPU performance work — multi-node/distributed inference, KV-cache/memory optimization, quantization (FP8/INT8/AWQ/GPTQ), or cold-start/throughput tuning
  • Experience with Kubernetes, cloud infrastructure (AWS/GCP), GPUs, serverless/elastic GPU platforms (e.g., Modal), or high-throughput batch systems
  • Experience with LLM gateways, model routing, vendor abstraction, or cost attribution
  • Experience building developer platforms, internal platforms, or self-serve infrastructure
  • Experience building and deploying AI agents or MCP servers in production
  • Experience with eval systems, LLM observability, tracing, RAG, search, or vector databases

Responsibilities

  • Lead the design of infrastructure that helps DoorDash teams move GenAI ideas from prototype to production, increasing the velocity of business impact from AI across the company.
  • Own and evolve our open-weights serving stack — real-time GPU endpoints, high-throughput batch inference, and fine-tuning (SFT/DPO/LoRA) — alongside the LLM Gateway, Agent Gateway, evals infrastructure, guardrails, and cost attribution.
  • Architect scalable, high-performance systems for model serving, batch inference, GPU autoscaling, and fine-tuning that power real customer and internal automation use cases
  • Push the cost and latency frontier of GPU inference — turning batch jobs that took days into hours and cutting inference cost by multiples — while giving product teams a clean choice across open-weight and closed-source models with reliability, fallback, observability, and cost controls built in.
  • Build platforms that support rapid experimentation while meeting production standards for latency, scale, monitoring, SLOs, playbooks, and operational excellence.
  • Partner closely with — and raise the technical bar for — ML engineers, product engineers, data scientists, and platform teams across DoorDash, Wolt, and Deliveroo to turn emerging GenAI capabilities into durable platform primitives.
  • Set technical direction for the future of DoorDash’s centralized GenAI platform — including emerging directions such as reinforcement learning (RLHF/RLVR), agent optimization, and other post-training and agentic techniques — enabling the next generation of AI-powered products, agents, automation, and personalization.

Benefits

  • 401(k) plan with employer matching
  • 16 weeks of paid parental leave
  • wellness benefits
  • commuter benefits match
  • paid time off
  • paid sick leave
  • medical, dental, and vision benefits
  • 11 paid holidays
  • disability and basic life insurance
  • family-forming assistance
  • mental health program
© 2026 Teal Labs, Inc
Privacy PolicyTerms of Service