Staff Embedded Software Engineer, Machine Learning

Qualcomm•Toronto, ON

About The Position

As a member of the Low Power AI solution team, you will play a critical role in deploying AI models on Qualcomm's low power AI accelerator. The position focuses on mapping high-level machine learning operators to low-level hardware instructions, involving various optimization techniques: graph transformation, scheduling, memory planning, individual operator implementation, quantization, etc. Your expertise in machine learning is expected to enhance inference efficiency and accuracy of different models on Qualcomm's hardware architecture. This is a new position.

Requirements

Strong hands-on experience in performance optimization for embedded or low-power systems.
Proficient in C/C++ programming, with a focus on system-level and runtime development.
Solid understanding of embedded system design, including memory hierarchy and hardware-software interaction.
Experience with Linux/Android development environments and toolchains.
Familiarity with computer architecture, especially for AI accelerators or DSPs.
Basic knowledge of machine learning concepts and model structures.
Bachelor's degree in Computer Science, Engineering, Information Systems, or related field and 4+ years of Hardware Engineering, Software Engineering, Systems Engineering, or related work experience.
Master's degree in Computer Science, Engineering, Information Systems, or related field and 3+ years of Hardware Engineering, Software Engineering, Systems Engineering, or related work experience.
PhD in Computer Science, Engineering, Information Systems, or related field and 2+ years of Hardware Engineering, Software Engineering, Systems Engineering, or related work experience.

Nice To Haves

Master’s degree in Computer Science, Engineering, or related field.
5+ years of experience with ML frameworks (e.g., TensorFlow, PyTorch, ONNX).
5+ years of experience in embedded system development and optimization for ML inference.
5+ years of experience with C/C++ in performance-critical environments.
Experience with low-level OS interactions (Linux, Android, QNX).
Familiarity with quantization, graph optimization, and model deployment pipelines.
Experience working in cross-functional teams and large matrixed organizations.

Responsibilities

Design and implement core components of the ML runtime framework for inference on embedded systems.
Collaborate with compiler, hardware, and model teams to co-design efficient execution paths for AI workloads.
Develop and maintain C/C++ code for runtime kernels and system-level integration.
Develop tools to assist with performance profiling and debugging of quantized model accuracy.
Analyze and improve runtime behavior using profiling tools and hardware counters.
Support deployment of models from popular ML frameworks (e.g., Onnx, TensorFlow, PyTorch) onto Qualcomm’s inference stack.

Benefits

Competitive annual discretionary bonus program
Opportunity for annual RSU grants
Competitive benefits package designed to support your success at work, at home, and at play.

Stand Out From the Crowd

Upload your resume and get instant feedback on how well it matches this job.

Upload and Match Resume