AI Accuracy Architect

Qualcomm•San Diego, CA

55d

About The Position

Qualcomm is leveraging its leadership in compute, connectivity, and AI acceleration to play a central role in the evolution of Cloud AI. The Qualcomm Cloud AI team develops hardware and software platforms enabling efficient, high quality inference of large scale foundation models. We are seeking a Staff Engineer – AI Accuracy Architect to lead accuracy centric architecture and optimization for LLMs, VLMs, and emerging multimodal models, working closely with compiler, performance, and model optimization teams. This role spans Day0 hardware enablement through production deployment and requires deep expertise in quantization, numerics, and accuracy–performance tradeoffs across the inference stack. This is a senior technical role with broad cross-functional impact.

Requirements

Extensive hands on experience with LLMs and/or VLMs in production or preproduction environments.
Expert level understanding of quantization and numerics, including precision tradeoffs and accumulation behavior.
Deep knowledge of transformer architectures, attention mechanisms, and MoEs.
Proven ability to balance accuracy, performance, and hardware constraints.
Experience across compiler, kernel, and hardware abstraction layers.
Strong Python skills and ability to scale accuracy experiments.
Solid foundation in computer architecture and ML accelerators.
Strong technical leadership and communication skills.
MS in CS, CE, EE, or related field, or equivalent experience.
Bachelor's degree in Computer Science, Engineering, Information Systems, or related field and 4+ years of Hardware Engineering, Software Engineering, Systems Engineering, or related work experience.
Master's degree in Computer Science, Engineering, Information Systems, or related field and 3+ years of Hardware Engineering, Software Engineering, Systems Engineering, or related work experience.
PhD in Computer Science, Engineering, Information Systems, or related field and 2+ years of Hardware Engineering, Software Engineering, Systems Engineering, or related work experience.

Nice To Haves

PhD in a related field.
Experience with ML compilers and torch.compile
Background in numerical methods, linear algebra, and accuracy evaluation frameworks.

Responsibilities

Own accuracy architecture for LLM, VLM, and multimodal inference, balancing model quality, performance, power, and hardware constraints.
Lead Day0 enablement of cutting edge models on current and future Qualcomm AI platforms in partnership with compiler, performance, firmware, and silicon teams.
Design, implement, and evaluate quantization strategies (e.g., PTQ, QAT, mixed precision, per-channel/group-wise), understanding their impact on accuracy, latency, throughput, and memory.
Analyze and resolve accuracy regressions and numerical stability issues across kernels, compilers, runtimes, and hardware.
Partner with performance engineers to co-optimize kernels and execution strategies where accuracy and performance intersect.
Drive model conversion, optimization, and deployment using PyTorch and ONNX, with accuracy validation as a first class requirement.
Define accuracy evaluation metrics and tooling to track regressions and improvements over time.
Serve as a technical authority and mentor on accuracy, quantization, and numerics across teams.
Engage with customers and partners to debug complex accuracy issues and deliver production ready solutions.

Benefits

We also offer a competitive annual discretionary bonus program and opportunity for annual RSU grants (employees on sales-incentive plans are not eligible for our annual bonus). In addition, our highly competitive benefits package is designed to support your success at work, at home, and at play. Your recruiter will be happy to discuss all that Qualcomm has to offer – and you can review more details about our US benefits at this link.

Stand Out From the Crowd

Upload your resume and get instant feedback on how well it matches this job.

Upload and Match Resume