ML Accelerator Performance Validation Engineer, Post Silicon Validation

Amazon•Austin, TX

1d•$143,700 - $194,400•Onsite

About The Position

Join our Post-Silicon Validation team to quantify and qualify the performance of AWS's custom ML training chips against architectural targets. You'll bridge the gap between silicon capabilities and real-world ML workload demands — ensuring our accelerators deliver on latency, throughput, and efficiency promises at cloud scale. You'll work in a fast-paced, startup-like environment alongside some of the brightest minds in the industry on next generation AI/ML hardware that powers AWS's training and inference infrastructure. Your analysis will directly shape architectural decisions for next-generation accelerators and determine when silicon is ready for production deployment.

Requirements

3+ years of non-internship professional software development experience
2+ years of non-internship design or architecture (design patterns, reliability and scaling) of new and existing systems experience
Experience with Machine Learning and Large Language Model fundamentals, including architecture, training/inference lifecycles, and optimization of model execution, or experience working with PyTorch or JAX software
Bachelor's degree in computer science, engineering, mathematics or equivalent, or experience in Java, C++, Python, or a related language
3+ years of experience with hardware performance counters and profiling tools for analyzing and optimizing system and application performance
Strong understanding of computer architecture fundamentals including memory hierarchies (caches, DRAM, HBM), compute pipelines, and interconnect topologies
Experience applying statistical methods, regression analysis, and data visualization techniques to interpret performance data and drive optimization decisions

Nice To Haves

3+ years of full software development life cycle, including coding standards, code reviews, source control management, build processes, testing, and operations experience
Experience with CUDA kernels or ML/low-level kernels, or experience in developing and deploying LLMs in production on GPUs, Neuron, TPU or other AI acceleration hardware
Experience in developing and deploying LLMs in production on GPUs, Neuron, TPU or other AI acceleration hardware, or experience with CUDA kernels or ML/low-level kernels
Knowledge of collective communications (AllReduce, AllGather) and scaling
Experience with HBM, PCIe, and/or DMA bandwidth characterization

Responsibilities

Design and execute performance benchmarks spanning micro-architectures to full model training
Measure and analyze compute throughput, memory bandwidth, interconnect latency, and more
Profile real ML workloads (transformer models, LLMs, vision models) on silicon
Identify performance bottlenecks and work with architecture teams on optimization
Build automated performance regression dashboards and tracking infrastructure
Correlate silicon measurements against RTL simulation and emulation predictions

Benefits

health insurance (medical, dental, vision, prescription, Basic Life & AD&D insurance and option for Supplemental life plans, EAP, Mental Health Support, Medical Advice Line, Flexible Spending Accounts, Adoption and Surrogacy Reimbursement coverage)
401(k) matching
paid time off
parental leave
sign-on payments
restricted stock units (RSUs)

Stand Out From the Crowd

Upload your resume and get instant feedback on how well it matches this job.

Upload and Match Resume