About The Position

AWS's Trainium and Inferentia chips power the world's largest machine learning clusters. Our team builds virtual platforms — full-system C++ and SystemC models of these custom SoCs — that let software teams start development months before silicon arrives. For Trainium3, our virtual platform enabled running a full training workload within 12 hours of first silicon. We're looking for a software engineer to build and own the models and infrastructure that make this possible. Why this role is interesting: - You'll own a product that software teams across AWS depend on — they literally can't start development without your virtual platform - The engineering challenges are genuinely interesting: full-system simulation, multi-subsystem integration, QEMU development, performance optimization at scale - You'll see the direct impact of your work when software teams hit the ground running on new silicon - As the team grows, there's a path into architectural modeling — using the platform to explore design alternatives and influence chip architecture - Small team, startup pace, big impact inside AWS's custom silicon org You will thrive in this role if you: - Have built functional models, virtual platforms, or system-level simulations for SoCs, ASICs, GPUs, or CPUs - Think of yourself as a software engineer first, with deep domain knowledge in chip architecture - Are comfortable in C++ or SystemC, and familiar with Python for tooling - Care about your customers' experience — you think about usability, documentation, and reliability, not just model accuracy - Are interested in expanding into performance or architectural modeling as the team scales - Enjoy working on a small, high-impact team where you own significant pieces of the stack No ML background needed. You'll learn the ML accelerator domain on the job. This role can be based in Cupertino, CA or Austin, TX.

Requirements

  • Experience programming languages such as C/C++, Python, Java or Perl
  • 2+ years writing functional models, virtual platforms, or system-level simulations for hardware (SoCs, ASICs, GPUs, CPUs)
  • Familiarity with SoC, CPU, GPU, and/or ASIC architecture and micro-architecture
  • 2+ years of full software development life cycle, including coding standards, code reviews, source control management, build processes, testing, and operations experience
  • Experience developing for or integrating with QEMU
  • Experience with SystemC, TLM, or transaction-level modeling
  • Experience building simulation infrastructure, CI pipelines, or release tooling
  • Familiarity with Modern C++ (20 and beyond)
  • Experience with PyTest, GoogleTest, or similar test frameworks
  • Experience with multi-threaded programming
  • Familiarity with firmware, driver, or runtime software development

Responsibilities

  • Build and own functional models of SoC subsystems that integrate into our full-system virtual platform, used by firmware, driver, runtime, and application software teams
  • Design models for usability and performance — your customers are software engineers who need to run real workloads on your platform efficiently
  • Develop and improve the virtual platform infrastructure: QEMU integration, simulation performance, build and release tooling, and customer-facing documentation
  • Work with software teams (your primary customers) to understand their workflows, debug issues on the platform, and shape the model to maximize their productivity
  • Drive simulation performance improvements so the platform can handle increasingly complex workloads at scale
  • Contribute to model architecture decisions — choosing the right level of abstraction and fidelity for each subsystem based on customer needs

Benefits

  • Amazon also offers comprehensive benefits including health insurance (medical, dental, vision, prescription, Basic Life & AD&D insurance and option for Supplemental life plans, EAP, Mental Health Support, Medical Advice Line, Flexible Spending Accounts, Adoption and Surrogacy Reimbursement coverage), 401(k) matching, paid time off, and parental leave.
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service