About The Position

The Senior SoC Software DevOps Engineer role centers on enabling the rapid and reliable development of software for AWSs most advanced custom machine learning chips. This position is critical to supporting the Trainium and Inferentia families of silicon which power large scale AI training at AWS. The engineer will serve as the primary owner of infrastructure that directly affects how quickly software teams can iterate on code for both pre silicon simulation environments and post silicon production deployments. By building robust automation and tooling the role ensures that tape outs for new chips stay on schedule and that software is ready to function immediately when first silicon becomes available. This work has a direct impact on AWSs ability to deliver advanced ML infrastructure to its largest customers. This role operates at the intersection of hardware and software requiring deep expertise in infrastructure engineering to solve unique challenges such as coordinating releases across isolated environments and validating firmware on real silicon. It is a foundational position for the SoC software teams as it frees engineers from infrastructure burdens allowing them to focus on feature development. Success in this role will be measured by improvements in development velocity release quality and the stability of systems that support multiple teams. The position demands a proactive approach to identifying bottlenecks and a strong ability to operate within novel technical contexts without prior domain knowledge in machine learning or chip design. We're part of the SoC Software organization within Annapurna Labs (AWS). Our three software teams — uCode, HAL (Hardware Abstraction Layer), and Modeling — build the firmware, drivers, and virtual platforms for AWS's custom ML accelerator chips. We operate like a startup: small teams, high ownership, direct impact on AWS's most strategic silicon programs. This DevOps engineer will work across all three teams, with a mandate to improve velocity, quality, and developer experience for the entire SoC software organization. Our team is dedicated to supporting new members. We have a broad mix of experience levels and tenures, and we're building an environment that celebrates knowledge-sharing and mentorship.

Requirements

  • 7+ years of full software development life cycle, including coding standards, code reviews, source control management, build processes, testing, and operations experience
  • Experience as a mentor, tech lead or leading an engineering team
  • Experience programming in Python and at least one of: Bash, Go, C++, or Java
  • Experience with infrastructure-as-code (CDK, CloudFormation, Terraform, etc.)
  • Experience with AWS services (Lambda, S3, EC2, CloudWatch, IAM, Secrets Manager, etc.)
  • Experience with Linux-based build and development environments

Nice To Haves

  • Bachelor's degree in computer science or equivalent
  • Experience with Amazon's internal build and release systems (Brazil, Pipelines, CRUX, Apollo, etc.)
  • Experience building cross-environment or cross-account automation (e.g., bridging corporate and isolated/air-gapped environments)
  • Experience with Jenkins pipeline development and administration
  • Experience with hardware-in-the-loop testing or supporting hardware/silicon development teams
  • Experience building observability infrastructure: dashboards, metrics pipelines, alerting (CloudWatch, QuickSight, or similar)

Responsibilities

  • The engineer will own the end to end CI/CD pipelines and release processes for all SoC software components including firmware hardware abstraction layers and modeling tools.
  • This involves designing maintaining and evolving systems that produce reliable releases for both internal verification teams and external AWS services.
  • A key task is ensuring these pipelines function across heterogeneous environments such as Corp networks and VPC.
  • The role requires building qualification workflows that guarantee software meets strict quality standards before reaching customers or verification teams.
  • Another core duty is developing hardware in the loop test infrastructure that validates SoC software on actual silicon in laboratory and automated testing settings.
  • This includes creating frameworks to run tests on real chips simulate pre silicon environments and integrate results into continuous integration workflows.
  • Additionally the engineer must build observability tools such as dashboards that track build health test coverage and pipeline performance along with alerting systems that notify teams of regressions.
  • A significant focus will be on identifying and removing friction in development workflows such as slow build times or complex release steps using data driven insights to prioritize improvements that accelerate team productivity.
  • The role also involves solving novel problems like bridging disconnected environments and orchestrating synchronized releases across multiple domains.

Benefits

  • health insurance (medical, dental, vision, prescription, Basic Life & AD&D insurance and option for Supplemental life plans, EAP, Mental Health Support, Medical Advice Line, Flexible Spending Accounts, Adoption and Surrogacy Reimbursement coverage)
  • 401(k) matching
  • paid time off
  • parental leave
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service