Amazon Devices is seeking an ML Kernel Performance Engineer to work at the hardware-software boundary of their advanced compression platform and custom neural accelerator silicon. This role focuses on crafting high-performance CUDA and Triton kernels to optimize neural network compression algorithms for training, fine-tuning, and inference. The engineer will build tooling and kernel libraries to democratize GPU performance optimization, enabling scientists and engineers to profile and diagnose kernel bottlenecks without requiring deep CUDA expertise. The work involves ensuring that novel quantization schemes and sparse computation patterns translate into real throughput gains on GPU hardware, directly accelerating training runs and enabling the deployment of compressed models to edge devices and cloud inference.
Stand Out From the Crowd
Upload your resume and get instant feedback on how well it matches this job.
Job Type
Full-time
Career Level
Mid Level
Education Level
Associate degree