Staff Machine Learning Engineer – AI/ML Compiler

Qualcomm•San Diego, CA

About The Position

About the Role Qualcomm AI Hub is the platform for on-device AI — enabling developers to easily integrate, optimize, and deploy ML models on Qualcomm devices. Qualcomm AI Hub Workbench lets developers compile trained PyTorch or ONNX models into deployable artifacts targeting a variety of runtimes — LiteRT, ONNXRuntime, or Qualcomm AI Engine Direct SDK (QAIRT) — and profile and validate them on real Qualcomm devices hosted in the cloud. Join the Qualcomm AI Hub Compiler team and own the infrastructure that powers these model compilations. You will work across the full compilation pipeline — from model ingestion and graph optimization to backend dispatch across CPU, GPU, and NPU — ensuring models compile correctly, execute efficiently, and scale across a growing catalog of on-device use cases spanning vision, audio, speech, and multi-modal models.

Requirements

Bachelor's degree in Computer Science, Engineering, Information Systems, or related field and 4+ years of Hardware Engineering, Software Engineering, Systems Engineering, or related work experience.
Master's degree in Computer Science, Engineering, Information Systems, or related field and 3+ years of Hardware Engineering, Software Engineering, Systems Engineering, or related work experience.
PhD in Computer Science, Engineering, Information Systems, or related field and 2+ years of Hardware Engineering, Software Engineering, Systems Engineering, or related work experience.

Nice To Haves

3+ years of industry experience in ML infrastructure, compiler engineering, or AI framework development
Proficient in Python and C++
Solid understanding of ML compiler concepts (graph IRs, operator fusion, shape inference, lowering passes, backend partitioning) and hands-on experience with one or more compiler stacks such as MLIR, ONNX, or TVM
Experience with PyTorch model export (torch.export, torch.compile, FX, ATen IR) and on-device deployment frameworks such as LiteRT, ExecuTorch, or ONNXRuntime
Familiarity with SoC-level constraints (memory bandwidth, compute precision, NPU/DSP execution) and hardware-specific runtimes such as QAIRT/QNN is a plus
Experience building automated CI/CD pipelines for model compilation and validation at scale
Strong written and verbal communication skills; proficiency with git and software engineering best practices

Responsibilities

Design, develop, and maintain the end-to-end compilation pipeline powering Qualcomm AI Hub Workbench, from PyTorch and ONNX model ingestion through graph optimization to deployable artifacts targeting LiteRT, ONNXRuntime, or QAIRT on Snapdragon SoCs
Build and maintain ONNX-based compilation paths using ONNX IR: graph transformation passes, op validation, and opset compatibility handling
Build and maintain PyTorch compilation paths consuming torch.export output, including dynamic shapes, custom ops, and ATen IR decomposition
Contribute to ONNXRuntime QNN execution provider: graph optimizations, graph partitioning, and op validation and lowerings
Collaborate with QAIRT and QNN teams to ensure correct and efficient model execution across CPU, GPU, and NPU backends
Build tooling to analyze, profile, and debug compilation failures, accuracy regressions, and performance degradations; develop clear, actionable developer-facing diagnostics
Own compilation and validation of models published on Qualcomm AI Hub, ensuring correct conversion and verified performance across supported runtime targets
Build and maintain automated compilation pipelines and CI/CD evaluation harnesses to scale model onboarding as the Qualcomm AI Hub model catalog grows
Partner with internal Business Units to onboard models through Qualcomm AI Hub compilation workflows, translating deployment constraints (target SoC, latency budgets, memory limits) into concrete compilation strategies
Author technical documentation, tutorials, and example notebooks for the Qualcomm AI Hub developer community