Post-Silicon Systems Validation Engineer, Annapurna Labs

AmazonAustin, TX
$143,700 - $194,400Onsite

About The Position

Annapurna Labs, an AWS organization with development centers in the U.S. and Israel, builds custom silicon and software for AWS customers. Our team combines cloud-scale innovation with world-class expertise across silicon engineering, hardware design, verification, software, and operations to tackle technical challenges that have never been seen before. Join our Silicon Validation team to validate next-generation machine learning accelerators that power AWS's cloud computing infrastructure. You'll work in a fast-paced, startup-like environment alongside some of the brightest minds in the industry on cutting-edge, internet-scale technology that directly impacts how customers use Machine Learning acceleration. We are changing the landscape of cloud infrastructure by accelerating the development of custom silicon by moving beyond traditional partnerships to dominate in AI training and inference. Your work will span validation of the complete vertical stack—silicon, PCB, high-speed components (HBM, PCIe, chip-to-chip), inter-system connections, and system-to-system interfaces. You'll dive deep into new technology hardware components and scaling technologies that power our Machine Learning boards and servers at scale, ensuring every component of our hardware and software comes together into products our customers rely on.

Requirements

  • 3+ years of non-internship professional software development experience
  • 2+ years of non-internship design or architecture (design patterns, reliability and scaling) of new and existing systems experience
  • Experience with Machine Learning and Large Language Model fundamentals, including architecture, training/inference lifecycles, and optimization of model execution, or experience working with PyTorch or JAX software
  • Bachelor's degree in computer science, engineering, mathematics or equivalent, or experience in Java, C++, Python, or a related language
  • 3+ years of experience with hardware performance counters and profiling tools for analyzing and optimizing system and application performance
  • Strong understanding of computer architecture fundamentals including memory hierarchies (caches, DRAM, HBM), compute pipelines, and interconnect topologies
  • Experience applying statistical methods, regression analysis, and data visualization techniques to interpret performance data and drive optimization decisions
  • Strong programming skills (Python, Lua, C/C++, Rust, Go, etc)
  • A solid understanding of computer architecture
  • Experience with AWS services, cloud infrastructure, firmware development (BIOS, BMC, drivers)
  • Validation experience in any of these areas: PCIe, HBM, GPUs, neural networks, ML HW architecture, and/or CI/CD
  • Familiarity with the validation lifecycle from RTL simulation (SystemVerilog/UVM, VCS, Questa, Xcelium) and emulation (Palladium, Zebu, Veloce) through silicon failure analysis and debug

Nice To Haves

  • 3+ years of full software development life cycle, including coding standards, code reviews, source control management, build processes, testing, and operations experience
  • Bachelor's degree in computer science or equivalent
  • Experience with Machine Learning Hardware/Software Architecture
  • Experience with CI/CD
  • Experience with EDA Simulations or Emulation

Responsibilities

  • Own critical validation aspects across the entire product development lifecycle—from early design validation through emulation, silicon bring-up, post-silicon validation, and ongoing support of production systems deployed in AWS data centers.
  • Collaborate deeply with architecture, RTL design, design verification, firmware, and software teams to ensure our next-generation AI/ML accelerators meet the highest standards of quality and performance.
  • Develop comprehensive validation strategies and detailed test plans covering functional, performance, power, and stress testing from silicon bring-up to product release.
  • Execute complex test plans from RTL simulation and emulation environments through physical silicon validation.
  • Conduct hands-on silicon bring-up and debug in the lab using oscilloscopes, logic analyzers, and protocol analyzers.
  • Validate ML accelerator performance, accuracy, and reliability using real-world neural network workloads.
  • Build test infrastructure, CI/CD, and automated regression frameworks to enable efficient validation at scale.
  • Collaborate across architecture, design, firmware, and software teams to triage failures and drive root cause analysis to closure.
  • Review test results, identify patterns, and provide feedback to improve design quality and validation coverage.
  • Support production systems in AWS data centers and address field issues as they arise.

Benefits

  • health insurance (medical, dental, vision, prescription, Basic Life & AD&D insurance and option for Supplemental life plans, EAP, Mental Health Support, Medical Advice Line, Flexible Spending Accounts, Adoption and Surrogacy Reimbursement coverage)
  • 401(k) matching
  • paid time off
  • parental leave
  • sign-on payments
  • restricted stock units (RSUs)
© 2026 Teal Labs, Inc
Privacy PolicyTerms of Service