About The Position

About the Role Join the Qualcomm AI Hub team and help developers integrate machine learning into their products and experiences: https://aihub.qualcomm.com/. In this role you will develop tools to help developers optimize and deploy machine learning models on edge and mobile hardware. AIMET is Qualcomm's open-source library for state-of-the-art model quantization, and compression techniques. You will develop and support cutting-edge model optimization workflows — pushing the boundary of what's possible on resource-constrained hardware. Applications range from quantizing large language models (LLMs) and generative AI models to compressing latency-critical vision, audio, and multimodal networks for deployment on Qualcomm Snapdragon and other edge SoCs. For this role we are seeking a talented and motivated Staff Software Engineer with expertise in the optimizing and deploying ML models – especially for edge devices.

Requirements

  • Bachelor's degree in Computer Science, Engineering, Information Systems, or related field and 4+ years of Hardware Engineering, Software Engineering, Systems Engineering, or related work experience.
  • Master's degree in Computer Science, Engineering, Information Systems, or related field and 3+ years of Hardware Engineering, Software Engineering, Systems Engineering, or related work experience.
  • PhD in Computer Science, Engineering, Information Systems, or related field and 2+ years of Hardware Engineering, Software Engineering, Systems Engineering, or related work experience.

Nice To Haves

  • 3+ years of industry experience in machine learning, deep learning, or AI infrastructure
  • Strong proficiency in Python, with hands-on experience in PyTorch, ONNX and/or TensorFlow
  • Solid understanding of neural network architectures — CNNs, Transformers, LLMs, diffusion models, multimodal models
  • Experience with model quantization techniques — PTQ, QAT, weight-only quantization, mixed-precision, sub-4-bit methods
  • Hands-on experience quantizing LLMs (GPT, LLaMA, Mistral, Falcon, or similar families) for inference optimization
  • Familiarity with AIMET, GPTQ, AWQ, SmoothQuant, or similar quantization frameworks is a strong plus
  • Experience working with ONNX, TFLite/LiteRT, or other model interchange formats
  • Understanding of hardware constraints: memory bandwidth, compute precision (INT4/INT8/FP16/BF16), and NPU/DSP execution
  • Experience collaborating across teams or BUs to drive technical alignment and model delivery
  • Proficiency with git and software development best practices
  • Strong written and verbal communication skills — ability to write clean APIs, documentation, and engage directly with external developers
  • Experience with C++ for performance-critical components is a bonus
  • Familiarity with ARM processors and mobile SoC architecture (Snapdragon) is a plus
  • Experience with automated evaluation pipelines and model benchmarking at scale is a plus

Responsibilities

  • Design, develop, and maintain quantization algorithms and compression pipelines within the AIMET framework (PTQ, QAT, mixed-precision, AdaScale etc.)
  • Implement advanced quantization techniques including weight-only quantization, activation quantization, KV-cache quantization, and sub-4-bit quantization for LLMs and generative AI models
  • Build tooling to analyze, profile, and debug model accuracy degradation caused by quantization
  • Integrate AIMET workflows with popular ML frameworks — PyTorch and ONNX
  • Develop APIs and developer-facing tooling to make AIMET accessible and easy to use for external customers and design partners
  • Integrate AIMET in AI Hub Workbench Quantize job to enable Quantization at large scale.
  • Own end-to-end quantization and optimization of models published on Qualcomm AI Hub, ensuring they meet accuracy, latency, and power targets on Qualcomm hardware
  • Quantize and validate a broad range of model families — vision transformers, LLMs, diffusion models, speech, and multimodal architectures — for deployment via AI Hub
  • Develop and maintain automated quantization pipelines and evaluation harnesses to scale model onboarding across AI Hub's growing model catalog
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service