ML Accelerator Performance Validation Engineer, Post Silicon Validation

AmazonAustin, TX
$143,700 - $194,400Onsite

About The Position

Join our Post-Silicon Validation team to quantify and qualify the performance of AWS's custom ML training chips against architectural targets. You'll bridge the gap between silicon capabilities and real-world ML workload demands — ensuring our accelerators deliver on latency, throughput, and efficiency promises at cloud scale. You'll work in a fast-paced, startup-like environment alongside some of the brightest minds in the industry on next generation AI/ML hardware that powers AWS's training and inference infrastructure. Your analysis will directly shape architectural decisions for next-generation accelerators and determine when silicon is ready for production deployment.

Requirements

  • 3+ years of non-internship professional software development experience
  • 2+ years of non-internship design or architecture (design patterns, reliability and scaling) of new and existing systems experience
  • Experience with Machine Learning and Large Language Model fundamentals, including architecture, training/inference lifecycles, and optimization of model execution, or experience working with PyTorch or JAX software
  • Bachelor's degree in computer science, engineering, mathematics or equivalent, or experience in Java, C++, Python, or a related language
  • 3+ years of experience with hardware performance counters and profiling tools for analyzing and optimizing system and application performance
  • Strong understanding of computer architecture fundamentals including memory hierarchies (caches, DRAM, HBM), compute pipelines, and interconnect topologies
  • Experience applying statistical methods, regression analysis, and data visualization techniques to interpret performance data and drive optimization decisions

Nice To Haves

  • 3+ years of full software development life cycle, including coding standards, code reviews, source control management, build processes, testing, and operations experience
  • Experience with CUDA kernels or ML/low-level kernels, or experience in developing and deploying LLMs in production on GPUs, Neuron, TPU or other AI acceleration hardware
  • Experience in developing and deploying LLMs in production on GPUs, Neuron, TPU or other AI acceleration hardware, or experience with CUDA kernels or ML/low-level kernels
  • Knowledge of collective communications (AllReduce, AllGather) and scaling
  • Experience with HBM, PCIe, and/or DMA bandwidth characterization

Responsibilities

  • Design and execute performance benchmarks spanning micro-architectures to full model training
  • Measure and analyze compute throughput, memory bandwidth, interconnect latency, and more
  • Profile real ML workloads (transformer models, LLMs, vision models) on silicon
  • Identify performance bottlenecks and work with architecture teams on optimization
  • Build automated performance regression dashboards and tracking infrastructure
  • Correlate silicon measurements against RTL simulation and emulation predictions

Benefits

  • health insurance (medical, dental, vision, prescription, Basic Life & AD&D insurance and option for Supplemental life plans, EAP, Mental Health Support, Medical Advice Line, Flexible Spending Accounts, Adoption and Surrogacy Reimbursement coverage)
  • 401(k) matching
  • paid time off
  • parental leave
  • sign-on payments
  • restricted stock units (RSUs)
© 2026 Teal Labs, Inc
Privacy PolicyTerms of Service