Sr Software Development Engineer - Silicon Development Infrastructure , ML Silicon Infrastructure

Amazon•Austin, TX

21h

About The Position

We're seeking a Senior Silicon Software Development Infrastructure Engineer to architect, build and operate the infrastructure that accelerates silicon development at Annapurna Labs. In this role, you'll design and deliver the platforms, tooling, and automation that enable our chip design teams to iterate faster, validate more thoroughly, and bring transformative silicon to market. You'll work at the intersection of cloud infrastructure, high-performance computing, and electronic design automation—building systems that directly impact AWS's ability to innovate in custom silicon. This is a unique opportunity to shape infrastructure that supports chip development while working with world-class engineers across hardware, software, and operations disciplines.

Requirements

5+ years of professional software and systems development experience
Bachelor's degree in Computer Science, Computer Engineering, Electrical Engineering, or equivalent work experience
Strong programming skills in Python or similar languages with demonstrated software engineering best practices
Familiarity with semiconductor development workflows and electronic design automation (EDA) tools in domains such as design verification, physical design, emulation, or formal verification
Experience designing, building, and operating cloud infrastructure with infrastructure-as-code methodologies
Solid understanding of networking, security, performance optimization, and distributed systems fundamentals
Experience with CI/CD systems such as Jenkins, GitLab CI, or similar platforms
Clear communication skills with ability to explain technical tradeoffs, propose solutions, and collaborate effectively across teams

Nice To Haves

Experience with operating system-level debugging and performance optimization, including NUMA node configuration, memory topology tuning, and system resource allocation strategies.
Experience operating AWS cloud environments at scale with deep knowledge of EC2, VPC, IAM, and related services
Experience designing and operating high-performance computing (HPC) or high-throughput computing (HTC) clusters with workload schedulers like Slurm
Hands-on experience with backend systems including message queues, caching layers, artifact repositories, or internal service platforms
Knowledge of enterprise authentication systems such as Entra ID, LDAP, FreeIPA, or SSSD
Experience with high-performance storage architectures and optimizing data movement for large-scale workloads
Familiarity with license server management for capacity-constrained or expensive commercial toolchains
Track record of driving operational excellence through monitoring, incident response, and continuous improvement

Responsibilities

Partner directly with silicon design, verification, emulation, formal verification, and software teams to deeply understand their development workflows, pain points, and iteration cycles.
Build customer-facing tooling including command-line interfaces, REST APIs, and automation services that eliminate manual toil and reduce time-to-results
Gather continuous feedback from internal customers and rapidly iterate on solutions. Benchmark infrastructure based on silicon development workflows to provide internal customers with the optimal resources for silicon development.
Design, implement, and operate cloud infrastructure (AWS preferred) and high-performance computing clusters using schedulers like Slurm
Build and maintain CI/CD pipelines for infrastructure-as-code, container images, service deployments, and cluster configuration changes with comprehensive testing, staged rollouts, and safe rollback mechanisms
Take full ownership of platform reliability, performance, and cost efficiency—from initial design through production operation and continuous improvement
Develop data pipelines that ingest metrics, logs, and workflow results from distributed systems
Design and operate databases that capture workflow metadata, job outcomes, and resource utilization patterns
Build dashboards and alerting systems that surface actionable insights on efficiency, utilization, reliability, and cost trends
Establish monitoring, incident response processes, runbooks, and documentation that enable operational excellence
Identify opportunities to simplify workflows and reduce complexity in silicon development infrastructure
Design pragmatic, scalable solutions that balance immediate needs with long-term maintainability
Challenge assumptions and propose innovative approaches to infrastructure problems, always asking "what's the simplest thing that could work?"
Build reusable abstractions and platforms that eliminate repetitive work across multiple teams and chip programs

Benefits

health insurance (medical, dental, vision, prescription, Basic Life & AD&D insurance and option for Supplemental life plans, EAP, Mental Health Support, Medical Advice Line, Flexible Spending Accounts, Adoption and Surrogacy Reimbursement coverage)
401(k) matching
paid time off
parental leave

Stand Out From the Crowd

Upload your resume and get instant feedback on how well it matches this job.

Upload and Match Resume