SoC Systems Software Engineer, Annapurna Labs Machine Learning Accelerators, AWS

Amazon•Cupertino, CA

About The Position

AWS designs custom SoCs (System on Chips) that power the world's largest machine learning training and inference clusters. Our organization builds both the SoCs and the low-level software stack that brings these chips to life — drivers that expose the hardware to the OS, runtime libraries that orchestrate computation, and collective communication software that coordinates thousands of chips working together across a network. We're looking for a Systems Software Engineer who wants to work at the boundary between hardware and software in both pre-silicon and post-silicon, where the problems are hard, the debugging is deep, and the impact is enormous. Our team develops SoC models and infrastructure to enable SoC validation, accelerate system software development, and enable architectural exploration. As part of the ML accelerator systems modeling software team, you will: - Develop and own components of our SoC models, both single-chip and at the datacenter-scale level - Debug complex hardware/software interactions across the full software stack — from register-level bring-up on functional models and emulators, to performance analysis on live silicon - Collaborate with chip architects, RTL designers, modelers, compiler engineers, and ML framework teams to co-design and validate the hardware/software interface - Contribute to the design of hardware features by providing a software perspective early in the chip development cycle - Build tooling, test infrastructure, and automation that accelerates development for yourself and your teammates Annapurna Labs, our organization within AWS, designs and deploys some of the largest custom silicon in the world. You'll work on software that runs on chips no one outside the team has seen yet, solving problems that don't have Stack Overflow answers. You'll see your code running in production on infrastructure that serves millions of ML workloads. You will thrive in this role if you: - Are comfortable reading hardware specs and translating them into working software - Have debugged problems where the root cause could be in hardware, software, or the interface between them - Have built firmware, drivers, runtime software, or communication libraries for SoCs, ASICs, GPUs, CPUs, or FPGAs - Care about performance and have experience profiling and optimizing latency-sensitive or throughput-critical code paths - Are comfortable in C++ close to the hardware and use Python for tooling and automation - Enjoy working on a small, high-impact team where you own significant pieces of the stack end-to-end Although we are building machine learning chips, no machine learning background is needed for this role. Any ML knowledge required can be learned on the job — what matters is your ability to write great low-level software and reason about hardware. This role can be based in either Cupertino, CA or Austin, TX.

Requirements

6+ years of full software development life cycle, including coding standards, code reviews, source control management, build processes, testing, and operations experience
Experience as a mentor, tech lead or leading an engineering team
7+ years of professional experience developing firmware, drivers, runtime software, or low-level systems software for custom hardware (SoCs, ASICs, GPUs, CPUs, FPGAs)
Experience programming in C++, Python, and/or Rust (preference for at least 2)
Knowledge of SoC, CPU, GPU, and/or ASIC architecture and micro-architecture

Nice To Haves

Experience with collective communication libraries or distributed systems primitives (MPI, NCCL, RCCL, or similar)
Experience debugging using functional models, QEMU, FPGA, or emulators
Experience with Linux kernel development, device drivers, or bare-metal firmware
Experience building functional or performance models of SoCs
Experience co-designing hardware/software interfaces with architecture or RTL teams
Familiarity with PCIe, DMA engines, on-chip interconnects, or network-on-chip architectures
Experience with performance profiling and optimization of latency-sensitive software
Experience with multi-threaded, multi-process, or asynchronous programming models

Responsibilities

Develop and own components of our SoC models, both single-chip and at the datacenter-scale level
Debug complex hardware/software interactions across the full software stack — from register-level bring-up on functional models and emulators, to performance analysis on live silicon
Collaborate with chip architects, RTL designers, modelers, compiler engineers, and ML framework teams to co-design and validate the hardware/software interface
Contribute to the design of hardware features by providing a software perspective early in the chip development cycle
Build tooling, test infrastructure, and automation that accelerates development for yourself and your teammates

Benefits

health insurance (medical, dental, vision, prescription, Basic Life & AD&D insurance and option for Supplemental life plans, EAP, Mental Health Support, Medical Advice Line, Flexible Spending Accounts, Adoption and Surrogacy Reimbursement coverage)
401(k) matching
paid time off
parental leave
sign-on payments
restricted stock units (RSUs)

Stand Out From the Crowd

Upload your resume and get instant feedback on how well it matches this job.

Upload and Match Resume