Engineering Manager, Model Inference

Abridge•San Francisco, CA

3d•$220,000 - $270,000

About The Position

Our generative AI-powered products are transforming the practice of medicine—and the inference systems that power them need to be fast, reliable, and world-class. We’re looking for an Engineering Manager to lead and grow our Model Inference team. The Inference team owns the end-to-end technical direction of how our models are served: from architecting low-latency, high-throughput infrastructure to pushing the frontier of LLM serving techniques. You’ll lead a high-performing team of AI inference engineers, partner closely with ML Research and the broader AI Platform, and ensure the systems underpinning every clinician interaction are operating at peak efficiency and reliability.

Requirements

5+ years of engineering experience with 1+ years in a technical leadership or management role
Deep, hands-on experience with ML systems and inference frameworks (e.g., PyTorch, TensorRT, vLLM, TensorFlow)
Strong understanding of LLM architecture (eg. Multi-Head Attention, Multi/Grouped-Query Attention, and common transformer components)
Experience with inference optimizations (eg. batching, quantization, kernel fusion, FlashAttention)
Familiarity with GPU characteristics, roofline models, and performance analysis
Experience deploying reliable, distributed, real-time systems at scale
Experience with parallelism strategies: tensor parallelism, pipeline parallelism, expert parallelism
Skilled at hiring and mentorship, with a demonstrated track record of helping engineers grow their skills and careers
Strong technical communication and cross-functional collaboration skills
Comfortable giving constructive feedback on technical designs and code reviews
Has thrived in a fast-growing startup and knows how to operate with urgency and focus

Nice To Haves

Background in training infrastructure and RL workloads
Skilled in building secure, compliant systems on major cloud platforms (GCP preferred, AWS experience welcome)
Experience with Kubernetes and container orchestration at scale
Published work or contributions to inference optimization research

Responsibilities

Lead and grow a high-performing team of AI inference engineers focused on building and scaling infrastructure for Abridge’s products and APIs
Own the technical direction of our inference systems—making key decisions around batching, throughput, latency, and GPU utilization
Architect and scale inference infrastructure for reliability, efficiency, and observability; lead incident response
Benchmark and eliminate bottlenecks throughout the inference stack
Partner with ML Research teams on model optimization, quantization, and deployment
Develop APIs for AI inference used by both internal teams and external customers
Recruit, mentor, and develop engineering talent; establish team processes, engineering standards, and operational excellence
Work closely with the GenAI Platform, Data, and Product teams to plan and execute projects that directly impact clinicians and patients

Benefits

14 paid holidays
flexible PTO for salaried employees
accrued time off for hourly employees
Medical, Dental, and Vision coverage for all full-time employees and their families
Generous HSA Contribution
Generous paid parental leave for all full-time employees
Family Forming Benefits
401(k) Matching
Personal Device Allowance
Flexible Spending Accounts (FSA) and Commuter Benefits
Lifestyle Wallet
Mental Health Support
Paid Sabbatical Leave after 5 years of employment
Competitive compensation and equity grants for full time employees