Senior Multi‑GPU Signal Processing and System Architecture Engineer

NVIDIA•Santa Clara, CA

About The Position

We are seeking a self-motivated senior engineer for the Aerial Omniverse Digital Twin team. This hire will own the design and implementation of the real-time signal-processing subsystem that converts physics-based channel descriptions into received signals for large numbers of emulated devices, across systems of potentially thousands of interconnected GPUs. This position offers the opportunity to work on foundational technology for 5G and 6G network simulation, using NVIDIA's world-class compute and interconnect platforms!

Requirements

PhD in high-performance computing, computer architecture, signal processing, or wireless communications (or equivalent experience).
12+ years of proven experience.
Proficiency in CUDA kernel design with attention to memory hierarchy, register pressure, and HBM bandwidth planning, with a track record of writing production-quality GPU code that meets hard real-time deadlines.
Demonstrated ability to build and reason about data flows across multi-device GPU systems (NVLink, NIC/RDMA) with explicit bandwidth and latency accounting.
Working knowledge of OFDM signal processing and the 5G NR physical layer, sufficient to implement and validate a channel-emulation pipeline.
Impactful publications involving GPU-accelerated numerical workloads or real-time system design.

Nice To Haves

Experience with GPU-accelerated RAN platforms, L1/L2 software stacks, or channel emulators.
Knowledge of high-bandwidth GPU interconnects (NVLink, NVSwitch) and their scaling properties.
Familiarity with massive MIMO beamformer design and MU-MIMO precoding.

Responsibilities

Design and implement GPU kernels that apply time-varying, multi-antenna channels to OFDM signals under hard real-time deadlines.
Architect the inter-cell data-flow layer — ensuring that the information each cell needs to model interference from its neighbours is compressed, transported, and consumed within the available NVLink and NIC budgets at scale.
Work with the propagation engine and RAN stack teams to orchestrate the end-to-end simulation pipeline, ensuring that propagation updates, channel application, and stack execution remain synchronised across hundreds or thousands of GPUs.
Assess design and implementation trade-offs between physical fidelity, latency, and system scalability.