LLM Serving Engineer (Cloud AI Engineering), Senior / Staff Engineer

Qualcomm•San Diego, CA

133d•$158,400 - $237,600

About The Position

Qualcomm is utilizing its traditional strengths in digital wireless technologies to play a central role in the evolution of Cloud AI. We are investing in several supporting technologies including Deep Learning. The Qualcomm Cloud AI team is developing hardware and software solutions for Inference Acceleration. We are hiring LLM Serving Engineers at multiple levels to join our dynamic, collaborative team. This role spans the full product lifecycle—from cutting-edge research and development to commercial deployment—and demands strategic thinking, strong execution, and excellent communication skills.

Requirements

Hands-on experience in one or more of the following LLM serving/Orchestration packages (Triton-Inference Server, vLLM, SGLang, Ollama, llm-d, KServe, LMCache, MoonCake).
Deep understanding of foundational LLMs, VLMs, SLMs, transformer-based architectures.
Strong experience in developing language models using PyTorch.
Strong computer science fundamentals - algorithms, data structures, parallel and distributed programming.
Understanding of computer architecture, ML accelerators, in-memory processing and distributed systems.
Strong Python development skills for large-scale projects with passion for software engineering.
Experience in analyzing, profiling, and optimizing deep learning workloads.
Proactive learning about the latest inference optimization techniques.
Excellent communication and problem-solving skills, with the ability to thrive in a fast-paced and collaborative environment.
MS in Computer Science, Machine Learning, Computer Engineering or Electrical Engineering.

Nice To Haves

Open-source contribution to any GenAI package.
Experience architecting and developing large-scale distributed systems.
High-level kernel design experience (PyTorch, CUDA, Triton).
Knowledge of torch.compile or torchDynamo.
PhD in Computer Science, Computer Engineering or Machine Learning.

Responsibilities

Building a scalable LLM inference platform using inference techniques (e.g. disaggregated serving and KV-Cache management, advanced parallelism, speculative algorithms, model optimization, specialized kernels).
Contribute to the development of LLM Serving packages (e.g. vLLM, SGLang, TGI, Triton-Inference server, Dynamo, LLM-d).
Work closely with customers to drive solutions by collaborating with internal compiler, firmware and platform teams.
Work at the forefront of GenAI by understanding advanced algorithms (e.g. attention mechanisms, MoEs) and numerics to identify new optimization opportunities.
Drive efficient serving through smart autoscaling, load balancing and routing.
Engage with open-source serving communities to evolve the framework.

Benefits

Competitive annual discretionary bonus program.
Opportunity for annual RSU grants.
Highly competitive benefits package designed to support your success at work, at home, and at play.

Stand Out From the Crowd

Upload your resume and get instant feedback on how well it matches this job.

Upload and Match Resume

What This Job Offers

Career Level

Entry Level

Education Level

Master's degree

Number of Employees

5,001-10,000 employees

LLM Serving Engineer (Cloud AI Engineering), Senior / Staff Engineer

About The Position

Requirements

Nice To Haves

Responsibilities

Benefits

What This Job Offers

Job Search Resources

Tools

Career Hubs

Guides

Company