Principal Firmware Engineer, Annapurna Labs ML Acceleration Systems Software

AmazonAustin, TX
$144,100 - $194,900Onsite

About The Position

In Annapurna Labs, we are at the forefront of hardware/software accelerator solutions for not only Amazon Web Services (AWS), but across the industry. The Machine Learning Acceleration Systems Firmware team is looking for candidates interested in diving deep into our designs of Machine Learning servers and developing world class firmware to support current and future generations of accelerator silicon. Our team designs and builds Annapurna's fleet of Accelerated Servers using internally designed silicon. We solve systemic hardware issues and we build hardware and software systems to detect and mitigate future failure recurrences so that our customers can experience the highest quality of service possible! In this role, you will lead an organization of software and firmware developers to build reliable server firmware deployed across millions of accelerators across EC2. You will build AI-driven software tooling that root causes failures and identifies causes of system failures—work that directly impacts how our customers leverage AWS Trainium for their machine learning workloads.

Requirements

  • 7+ years of working directly with engineering teams experience
  • Experience managing programs across cross functional teams, building processes and coordinating release schedules
  • Experience building and evaluating system-level technical design
  • Bachelor's degree in Computer Science, Computer Engineering, or related fields
  • Experience managing teams, or experience as a mentor, tech lead or leading an engineering team
  • Experience in software development, or experience troubleshooting and debugging technical systems and experience that includes strong analytical skills, attention to detail, and effective communication abilities
  • Experience with hardware/software integration and real-time systems
  • 10+ years of systems software or firmware engineering
  • Proficiency with programming languages commonly used in systems software (such as C, C++, Rust, or Python)

Nice To Haves

  • 5+ years of project management disciplines including scope, schedule, budget, quality, along with risk and critical path management experience
  • Experience managing projects across cross functional teams, building sustainable processes and coordinating release schedules
  • Experience defining KPI's/SLA's used to drive multi-million dollar businesses and reporting to senior leadership
  • Master's degree in Computer Science, Computer Engineering, or related fields
  • Experience troubleshooting and debugging technical systems
  • 5+ years of embedded firmware development experience
  • Knowledge of data center infrastructure design, operations, or delivery
  • Experience navigating a knowledge base and following Standard Operating Procedures (SOPs)
  • Experience with AI or machine learning applications in systems engineering

Responsibilities

  • Lead a team of software and firmware developers to design and develop server software at AWS scale.
  • Collaborate with hardware developers and software engineers to design validation strategies that ensure reliability across our entire product line.
  • Mentor your team through complex technical challenges.
  • Establish operational procedures that scale across products.
  • Work cross-functionally to integrate design-for-excellence principles into our development process.
  • Participate in technical discussions that shape how we approach system design & validation, ensuring we're catching issues before they reach customers.
  • Interface with our internal and external customers to understand project requirements and facilitate system development ontop of your server design.
  • Learn operational challenges to our existing fleet with the goal of improving the current customer experience as well as developing improved systems for future designs.
  • Work directly with vendors and ODM/JDM design teams to develop and manufacture your product at scale.
  • Drive and measure process improvements that enhance our operational effectiveness using data and key metrics.

Benefits

  • health insurance (medical, dental, vision, prescription, Basic Life & AD&D insurance and option for Supplemental life plans, EAP, Mental Health Support, Medical Advice Line, Flexible Spending Accounts, Adoption and Surrogacy Reimbursement coverage)
  • 401(k) matching
  • paid time off
  • parental leave
© 2026 Teal Labs, Inc
Privacy PolicyTerms of Service