About The Position

THIS IS A FULL‑TIME ONSITE ROLE REQUIRING 5 DAYS A WEEK IN THE OFFICE AT QUALCOMM’S SAN DIEGO LOCATION. As a leading technology innovator, Qualcomm pushes the boundaries of what’s possible to enable next‑generation experiences and drive digital transformation, creating a smarter, connected future for all. As a Staff/Sr. Staff Software Engineer in the Qualcomm AI Stack SDK Software team, you will design, develop, and deliver advanced AI/ML software solutions for Generative AI inference on Snapdragon platforms. This role focuses on model optimization, quantization, graph transformations, and runtime execution for modern AI architectures, including LLMs, LVMs, and LMMs. You will work at the intersection of machine learning algorithms, inference optimization, graph lowering, and systems software, contributing directly to the Qualcomm AI Stack SDK (QAIRT) and associated tools, including delegate support for ONNX Runtime, ExecuTorch, and TFLite/LiteRT frameworks. You will collaborate with ML Research, AI accelerator HW/SW teams, Product Management, Program Management, and QA to drive features from concept to production. This role requires strong technical ownership, the ability to work independently, and the capability to drive features end‑to-end while mentoring junior engineers.

Requirements

  • Bachelor’s degree in computer science, computer engineering, or a related field and 6+ years of total experience in software design, development, and delivery.
  • OR
  • Master’s degree/PhD in computer science, computer engineering, or a related field and 5+ years of total experience in software design, development, and delivery.
  • 3+ years of hands‑on experience in AI/ML software development, with a focus on inference or model optimization.
  • Strong understanding of AI/ML fundamentals, including deep learning and inference pipelines.
  • Deep understanding of transformer architectures, attention mechanisms, and performance trade‑offs.
  • Proficiency in Python and C/C++ for production‑quality software development.
  • Experience working with PyTorch and ONNX models and tooling.
  • Strong debugging skills, including the ability to perform root‑cause analysis and ensure high system reliability.
  • Ability to work independently, collaborate across teams, and drive complex features end‑to-end.

Nice To Haves

  • Working knowledge of graph theory, graph optimizations, and compiler‑style transformations.
  • Experience with LLM, LVM, and LMM inference pipelines, including prefill and generation workflows.
  • Familiarity with the Hugging Face ecosystem, including model repositories and interfaces such as PEFT.
  • Experience with LoRA, MoE‑based models, and awareness of modern GenAI inference techniques.
  • Experience with Android and/or RTOS environments (e.g., QNX).
  • Experience with CMake‑based build environments, agile software development practices, and git‑based SCM.
  • 2+ years of experience in embedded software or system‑level software development and optimization.
  • At least 2 years of experience interacting with senior leadership (Director level and above).
  • Ability to collaborate across a globally diverse team and manage multiple priorities.
  • Previous experience mentoring junior engineers.
  • User‑level or development experience with Qualcomm AI Stack / SDKs (e.g., QAIRT, QNN, Genie).
  • Exposure to Snapdragon SoCs and AI accelerators such as the NPU.
  • Prior hands‑on experience with GenAI features such as transformer architectures, LoRA, MoE, speculative decoding, and vision encoder/decoder models.

Responsibilities

  • Convert, optimize, and deploy AI models from PyTorch and ONNX frameworks for efficient inference on Snapdragon platforms.
  • Design and implement graph transformations, graph lowering, and optimization techniques within AI runtime environments such as ONNX Runtime, ExecuTorch, and the Qualcomm AI Stack SDK.
  • Apply knowledge of quantization and performance optimization to improve latency, throughput, memory usage, and power efficiency.
  • Work at the forefront of Generative AI, understanding advanced algorithms such as attention mechanisms, Mixture‑of‑Experts (MoE), Low‑Rank Adapters (LoRA), and emerging inference optimization techniques (e.g., speculative decoding).
  • Collaborate with ML Research teams to prototype and productize new features and techniques into SDK solutions.
  • Debug complex issues across models, runtime, OS, compiler, and hardware layers, working closely with QA and customer teams.
  • Design, implement, and deliver new features and enhancements to the Qualcomm AI Stack SDK.
  • Participate in design reviews and code reviews, ensuring software quality and maintainability.
  • Mentor junior engineers, helping them prioritize work and drive execution across multiple initiatives.
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service