About The Position

Do you want to help drive the development of CPU architectures to fuel the explosive growth in artificial intelligence (AI) / deep learning (DL), high-performance computing (HPC), gaming, virtual reality, and autonomous vehicles? Come join the CPU performance architecture team as a Senior System Simulation Architect and help us push performance boundaries for NVIDIA’s line of CPU products! NVIDIA is a global leader in accelerated computing, delivering breakthroughs in AI, HPC, and advanced system design. Our technologies power transformative applications across industries — from robotics and autonomous vehicles to healthcare and climate research. With the introduction of the Grace CPU Superchip, and more recently, the announcement of the Vera CPU, NVIDIA has expanded into the CPU server market, complementing our world-class GPUs and SoCs. These CPUs play a critical role in orchestrating complex workloads with exceptional performance-per-watt efficiency. The CPU architecture team is driving innovations that integrate seamlessly with NVIDIA’s broader technology stack, enabling faster AI model training, agentic use-cases, efficient data processing, and scalable cloud deployments.

Requirements

  • BS/MS in EE, CE, or CS or equivalent experience
  • 6 or more years of relevant experience
  • Excellent C/C++/Python programming skills
  • Experience in development of functional simulators and/or low-level software (OS, firmware, drivers); preferably both
  • Excellent debugging skills – of both system software/firmware and application software
  • Experience with the ARM ISA
  • Excellent communication and teamwork skills

Nice To Haves

  • Experience working with hardware emulators and/or FPGAs
  • Background in CPU workload analysis (SimPoint, etc.)
  • Experience with Linux kernel bringup and debug
  • Familiarity with CUDA
  • Experience with CPU/GPU application development and optimization in Pytorch, TensorFlow, and similar frameworks

Responsibilities

  • Develop full-system functional models capable of running complex multi-threaded heterogeneous (CPU/GPU) workloads – with special focus on the CPU subsystem.
  • Integrate functional models from various frameworks with RTL simulators and emulators, hardware (HW-in-the-loop), and detailed performance models.
  • Bring up system and application software in simulation and emulation – including firmware, Linux, drivers, benchmarks, and CPU/GPU workloads such as deep-learning (DL) and high-performance computing (HPC) workloads.
  • Port/extend/develop system software (firmware, OS, and drivers) to meet workload simulation needs.
  • Support CPU architects and performance engineers in their use of functional models, performance models, and emulation to drive next-generation CPU architectures.

Benefits

  • equity
  • benefits
© 2026 Teal Labs, Inc
Privacy PolicyTerms of Service