Senior Software Engineer, HPC & Optimization (AI-RAN)

DeepSig Inc•Arlington, VA

just now•Hybrid

About The Position

DeepSig is defining the future of wireless by merging deep learning with the Radio Access Network (RAN). We are seeking a Senior Software Engineer (HPC) to build the low-latency, high-performance engine powering our next-generation AI-native RAN. In this role, you will be the bridge between algorithmic innovation, hardware reality, and real-world deployment. You will architect and optimize the critical software pipelines that drive 5G/6G communications, ensuring our AI/ML workloads execute with deterministic microsecond latency on modern NVIDIA platforms. You will solve the "last mile" performance challenges—eliminating jitter, maximizing throughput, and ensuring hard real-time compliance—while working directly with partners to field, debug, and validate these systems in over-the-air (OTA) networks.

Requirements

Education: Bachelor’s or Master’s in Computer Science, Computer Engineering, or Electrical Engineering
C++ Expertise: Expert-level proficiency in Modern C++ (14/17/20), with a deep understanding of template metaprogramming, memory alignment, and lock-free concurrency
Low-Latency Mindset: Deep understanding of latency vs. throughput trade-offs; experience writing code where every microsecond counts (e.g., HFT, Real-Time Graphics, or Embedded Control Systems)
GPU Computing: Strong hands-on experience with CUDA programming, including mixed-precision optimization and stream management
Systems Generalist: Solid grasp of computer architecture fundamentals: PCIe topology, Cache Coherency, SIMD/AVX instruction sets, and OS kernel interaction.
Deployment Grit: Willingness to "get your hands dirty" debugging complex systems in lab and field environments periodically, working alongside integration partners

Nice To Haves

Wireless Domain: Understanding of 5G Physical Layer (PHY) processing, OFDM, or baseband algorithms is highly preferred
NVIDIA Ecosystem: Experience with the NVIDIA Aerial SDK (cuBB, cuPHY) or TensorRT
Networking: Experience with DPDK for high-speed packet processing or eCPRI/RoCEv2 networking
Community Experience: History of contributing to or maintaining open-source projects (C++ or Systems focus)

Responsibilities

Latency-Critical Implementation: Translate complex signal processing and AI algorithms into production-grade C++ (17/20) and CUDA code, strictly adhering to 5G TTI (Transmission Time Interval) timing budgets
Field Deployment & Integration: Work closely with customers and partners to deploy, trial, and validate AI-RAN systems in real-world OTA environments. Troubleshoot and debug complex integration and latency issues that arise only in the field
System Profiling & Tuning: Use tools like NVIDIA Nsight Systems to analyze end-to-end execution; identify and resolve "pipeline bubbles," latency spikes, and non-deterministic behavior that threatens real-time stability
Open-Source Collaboration: Engage with the OCUDU open-source community and partner ecosystems. Help polish internal components for public release and drive the transition of open-source tools into robust commercial products
GPU Optimization: Design and tune custom CUDA kernels, managing thread synchronization, warp divergence, and memory hierarchy (Global/Shared/Registers) to ensure maximum GPU saturation without stalling the pipeline
Real-Time Architecture: Configure system parameters for hard real-time performance (NUMA affinity, CPU isolation/pinning, Hugepages) and implement custom memory allocators to prevent OS scheduler preemption or malloc-induced jitter

Benefits

We offer competitive salaries and benefits, an employee stock option grant program, an environment where we are excited to be transforming and disrupting how signal processing is done with AI/ML, a welcoming and inclusive environment, a flexible schedule, and a great work / life balance.

Stand Out From the Crowd

Upload your resume and get instant feedback on how well it matches this job.

Upload and Match Resume