Qualcomm is seeking a low-level embedded engineer with a strong foundation in software and processor architecture to help shape architectural features and deliver measurable performance enhancements on Qualcomm's Neural Processing Unit (NPU). This role will work across the instruction set architecture, operating system, and processor architecture, partnering tightly with hardware and software teams to turn ML workload insights into architecture and software optimizations. The ideal candidate will be proficient in processor architecture, C/C++ and assembly, embedded operating systems, and performance profiling tools. Experience with AI workloads and large language models (LLMs) is a plus but not required.