AWS designs custom SoCs (System on Chips) that power the world's largest machine learning training and inference clusters. Our organization builds both the SoCs and the low-level software stack that brings these chips to life — drivers that expose the hardware to the OS, runtime libraries that orchestrate computation, and collective communication software that coordinates thousands of chips working together across a network. We're looking for a Systems Software Engineer who wants to work at the boundary between hardware and software in both pre-silicon and post-silicon, where the problems are hard, the debugging is deep, and the impact is enormous. Our team develops SoC models and infrastructure to enable SoC validation, accelerate system software development, and enable architectural exploration. As part of the ML accelerator systems modeling software team, you will: - Develop and own components of our SoC models, both single-chip and at the datacenter-scale level - Debug complex hardware/software interactions across the full software stack — from register-level bring-up on functional models and emulators, to performance analysis on live silicon - Collaborate with chip architects, RTL designers, modelers, compiler engineers, and ML framework teams to co-design and validate the hardware/software interface - Contribute to the design of hardware features by providing a software perspective early in the chip development cycle - Build tooling, test infrastructure, and automation that accelerates development for yourself and your teammates Annapurna Labs, our organization within AWS, designs and deploys some of the largest custom silicon in the world. You'll work on software that runs on chips no one outside the team has seen yet, solving problems that don't have Stack Overflow answers. You'll see your code running in production on infrastructure that serves millions of ML workloads. You will thrive in this role if you: - Are comfortable reading hardware specs and translating them into working software - Have debugged problems where the root cause could be in hardware, software, or the interface between them - Have built firmware, drivers, runtime software, or communication libraries for SoCs, ASICs, GPUs, CPUs, or FPGAs - Care about performance and have experience profiling and optimizing latency-sensitive or throughput-critical code paths - Are comfortable in C++ close to the hardware and use Python for tooling and automation - Enjoy working on a small, high-impact team where you own significant pieces of the stack end-to-end Although we are building machine learning chips, no machine learning background is needed for this role. Any ML knowledge required can be learned on the job — what matters is your ability to write great low-level software and reason about hardware. This role can be based in either Cupertino, CA or Austin, TX.
Stand Out From the Crowd
Upload your resume and get instant feedback on how well it matches this job.
Job Type
Full-time
Career Level
Mid Level
Education Level
No Education Listed