Engineering Manager, Model Inference

AbridgeSan Francisco, CA
$220,000 - $270,000

About The Position

Our generative AI-powered products are transforming the practice of medicine—and the inference systems that power them need to be fast, reliable, and world-class. We’re looking for an Engineering Manager to lead and grow our Model Inference team. The Inference team owns the end-to-end technical direction of how our models are served: from architecting low-latency, high-throughput infrastructure to pushing the frontier of LLM serving techniques. You’ll lead a high-performing team of AI inference engineers, partner closely with ML Research and the broader AI Platform, and ensure the systems underpinning every clinician interaction are operating at peak efficiency and reliability.

Requirements

  • 5+ years of engineering experience with 1+ years in a technical leadership or management role
  • Deep, hands-on experience with ML systems and inference frameworks (e.g., PyTorch, TensorRT, vLLM, TensorFlow)
  • Strong understanding of LLM architecture (eg. Multi-Head Attention, Multi/Grouped-Query Attention, and common transformer components)
  • Experience with inference optimizations (eg. batching, quantization, kernel fusion, FlashAttention)
  • Familiarity with GPU characteristics, roofline models, and performance analysis
  • Experience deploying reliable, distributed, real-time systems at scale
  • Experience with parallelism strategies: tensor parallelism, pipeline parallelism, expert parallelism
  • Skilled at hiring and mentorship, with a demonstrated track record of helping engineers grow their skills and careers
  • Strong technical communication and cross-functional collaboration skills
  • Comfortable giving constructive feedback on technical designs and code reviews
  • Has thrived in a fast-growing startup and knows how to operate with urgency and focus

Nice To Haves

  • Background in training infrastructure and RL workloads
  • Skilled in building secure, compliant systems on major cloud platforms (GCP preferred, AWS experience welcome)
  • Experience with Kubernetes and container orchestration at scale
  • Published work or contributions to inference optimization research

Responsibilities

  • Lead and grow a high-performing team of AI inference engineers focused on building and scaling infrastructure for Abridge’s products and APIs
  • Own the technical direction of our inference systems—making key decisions around batching, throughput, latency, and GPU utilization
  • Architect and scale inference infrastructure for reliability, efficiency, and observability; lead incident response
  • Benchmark and eliminate bottlenecks throughout the inference stack
  • Partner with ML Research teams on model optimization, quantization, and deployment
  • Develop APIs for AI inference used by both internal teams and external customers
  • Recruit, mentor, and develop engineering talent; establish team processes, engineering standards, and operational excellence
  • Work closely with the GenAI Platform, Data, and Product teams to plan and execute projects that directly impact clinicians and patients

Benefits

  • 14 paid holidays
  • flexible PTO for salaried employees
  • accrued time off for hourly employees
  • Medical, Dental, and Vision coverage for all full-time employees and their families
  • Generous HSA Contribution
  • Generous paid parental leave for all full-time employees
  • Family Forming Benefits
  • 401(k) Matching
  • Personal Device Allowance
  • Flexible Spending Accounts (FSA) and Commuter Benefits
  • Lifestyle Wallet
  • Mental Health Support
  • Paid Sabbatical Leave after 5 years of employment
  • Competitive compensation and equity grants for full time employees
© 2026 Teal Labs, Inc
Privacy PolicyTerms of Service