About The Position

We are seeking a highly motivated and innovative Embedded CPU Engineer to join the Platform Architecture team. In this role, you will drive performance and efficiency optimization and architectural feature exploration for Apple’s embedded CPUs that power critical functions across Apple's product line. As an Embedded CPU Engineer, you will help define CPUs that are specifically designed for running embedded applications across iPhone, iPad, Mac, and other Apple products. Your focus will be on understanding the unique constraints and opportunities of varied embedded use cases and translating those insights into improvements for both the software stack as well as the hardware including the CPU and its surrounding subsystem. You will be responsible for deep-dive performance analysis of embedded workloads, identifying bottlenecks in existing microarchitectures, and proposing optimization strategies that balance performance, power efficiency, and area. Working closely with algorithm teams, software engineers, and CPU designers, you will explore ISA extensions, microarchitecture enhancements, and system-level optimizations tailored to embedded use cases. This role requires some background in software profiling, performance modeling, and simulation environments. You will use and develop analysis tools and infrastructure to enable data-driven architectural decisions, create and analyze both real workloads and benchmarks representative of embedded workloads, and iterate with design teams to ensure ideas are implementable within power, timing, and area constraints.

Requirements

  • BS in Electrical Engineering, Computer Engineering, Computer Science, or similar
  • CPU architecture or microarchitecture experience
  • Experience with performance simulation environments, and performance analysis or optimization of workloads
  • Experience with one or more of the following ISAs: ARM, RISC-V, x86
  • Experience in C, C++, or similar programming languages
  • Experience with scripting languages such as Python or Perl for analysis and automation

Nice To Haves

  • MS or PhD in Electrical Engineering, Computer Engineering, or Computer Science
  • 10+ years of industry experience in CPU architecture or performance analysis
  • Expertise in CPU microarchitecture in one or more of the following areas: branch prediction, prefetching, pipeline optimization, datapath, memory hierarchy
  • Experience in one or more of the following areas: embedded ML workloads and inference engines, SIMD/vector architectures for signal processing or ML, or compiler infrastructure and toolchain development for embedded workloads
  • Experience with real-time operating systems and embedded software constraints
  • Understanding of: power-performance trade-offs in CPU designs, system-level power management, and low-power design techniques
  • Strong communication and collaboration skills across hardware and software teams
  • Experience taking architectural ideas from concept through implementation

Responsibilities

  • Drive performance and efficiency optimization and architectural feature exploration for Apple’s embedded CPUs.
  • Define CPUs specifically designed for running embedded applications across iPhone, iPad, Mac, and other Apple products.
  • Understand unique constraints and opportunities of varied embedded use cases and translate those insights into improvements for both the software stack as well as the hardware including the CPU and its surrounding subsystem.
  • Perform deep-dive performance analysis of embedded workloads.
  • Identify bottlenecks in existing microarchitectures.
  • Propose optimization strategies that balance performance, power efficiency, and area.
  • Explore ISA extensions, microarchitecture enhancements, and system-level optimizations tailored to embedded use cases.
  • Use and develop analysis tools and infrastructure to enable data-driven architectural decisions.
  • Create and analyze both real workloads and benchmarks representative of embedded workloads.
  • Iterate with design teams to ensure ideas are implementable within power, timing, and area constraints.
© 2026 Teal Labs, Inc
Privacy PolicyTerms of Service