Senior Software Engineer, HPC & Optimization (AI-RAN)

DeepSig IncArlington, VA
just nowHybrid

About The Position

DeepSig is defining the future of wireless by merging deep learning with the Radio Access Network (RAN). We are seeking a Senior Software Engineer (HPC) to build the low-latency, high-performance engine powering our next-generation AI-native RAN. In this role, you will be the bridge between algorithmic innovation, hardware reality, and real-world deployment. You will architect and optimize the critical software pipelines that drive 5G/6G communications, ensuring our AI/ML workloads execute with deterministic microsecond latency on modern NVIDIA platforms. You will solve the "last mile" performance challenges—eliminating jitter, maximizing throughput, and ensuring hard real-time compliance—while working directly with partners to field, debug, and validate these systems in over-the-air (OTA) networks.

Requirements

  • Education: Bachelor’s or Master’s in Computer Science, Computer Engineering, or Electrical Engineering
  • C++ Expertise: Expert-level proficiency in Modern C++ (14/17/20), with a deep understanding of template metaprogramming, memory alignment, and lock-free concurrency
  • Low-Latency Mindset: Deep understanding of latency vs. throughput trade-offs; experience writing code where every microsecond counts (e.g., HFT, Real-Time Graphics, or Embedded Control Systems)
  • GPU Computing: Strong hands-on experience with CUDA programming, including mixed-precision optimization and stream management
  • Systems Generalist: Solid grasp of computer architecture fundamentals: PCIe topology, Cache Coherency, SIMD/AVX instruction sets, and OS kernel interaction.
  • Deployment Grit: Willingness to "get your hands dirty" debugging complex systems in lab and field environments periodically, working alongside integration partners

Nice To Haves

  • Wireless Domain: Understanding of 5G Physical Layer (PHY) processing, OFDM, or baseband algorithms is highly preferred
  • NVIDIA Ecosystem: Experience with the NVIDIA Aerial SDK (cuBB, cuPHY) or TensorRT
  • Networking: Experience with DPDK for high-speed packet processing or eCPRI/RoCEv2 networking
  • Community Experience: History of contributing to or maintaining open-source projects (C++ or Systems focus)

Responsibilities

  • Latency-Critical Implementation: Translate complex signal processing and AI algorithms into production-grade C++ (17/20) and CUDA code, strictly adhering to 5G TTI (Transmission Time Interval) timing budgets
  • Field Deployment & Integration: Work closely with customers and partners to deploy, trial, and validate AI-RAN systems in real-world OTA environments. Troubleshoot and debug complex integration and latency issues that arise only in the field
  • System Profiling & Tuning: Use tools like NVIDIA Nsight Systems to analyze end-to-end execution; identify and resolve "pipeline bubbles," latency spikes, and non-deterministic behavior that threatens real-time stability
  • Open-Source Collaboration: Engage with the OCUDU open-source community and partner ecosystems. Help polish internal components for public release and drive the transition of open-source tools into robust commercial products
  • GPU Optimization: Design and tune custom CUDA kernels, managing thread synchronization, warp divergence, and memory hierarchy (Global/Shared/Registers) to ensure maximum GPU saturation without stalling the pipeline
  • Real-Time Architecture: Configure system parameters for hard real-time performance (NUMA affinity, CPU isolation/pinning, Hugepages) and implement custom memory allocators to prevent OS scheduler preemption or malloc-induced jitter

Benefits

  • We offer competitive salaries and benefits, an employee stock option grant program, an environment where we are excited to be transforming and disrupting how signal processing is done with AI/ML, a welcoming and inclusive environment, a flexible schedule, and a great work / life balance.
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service