Staff Engineer, High Performance Data & Algorithm Infrasturcture

Foresite Labs (Stealth Co)San Diego, CA
10dOnsite

About The Position

We are looking for a Senior Staff Software Engineer with deep expertise in high-performance computing (HPC), Linux systems, and GPU-accelerated data pipelines. This is a highly technical, hands-on role focused on extracting maximum performance from modern CPUs, GPUs, memory subsystems, and high-speed networks. You will work close to the hardware and operating system, tuning kernels, BIOS settings, and drivers, while also designing and implementing low-latency data processing pipelines that include real-time signal processing. If you enjoy profiling, tuning, and eliminating bottlenecks across the full stack— from BIOS to CUDA kernels to network offload—this role is for you.

Requirements

  • 7+ years of professional software engineering experience (or equivalent depth)
  • Strong background in high-performance computing or performance-critical systems
  • Expert-level Linux experience, including kernel and system tuning
  • Deep experience with GPU computing and CUDA (required)
  • Strong systems programming skills in C/C++ (and/or Rust)
  • Solid understanding of computer architecture:
  • CPU caches, NUMA, memory hierarchies
  • PCIe and DMA
  • GPU architectures
  • Extensive experience profiling and tuning complex systems
  • Comfortable using tools such as perf, ftrace, eBPF, valgrind, Nsight, and similar
  • Ability to reason quantitatively about latency, bandwidth, and throughput
  • Practical experience implementing DSP algorithms in production systems
  • Strong understanding of FFTs, convolution/deconvolution, filtering, and thresholding
  • Ability to optimize numerical algorithms for real-time or near-real-time constraints
  • BS/MS in Computer Science or Engineering

Nice To Haves

  • Experience with RDMA, GPUDirect RDMA, or other hardware offload technologies
  • Experience with custom kernel builds or kernel module development
  • Familiarity with real-time or low-latency Linux variants
  • Experience deploying HPC workloads at scale
  • Background in scientific computing, signal processing, or computational physics

Responsibilities

  • Design, build, and optimize high-throughput, low-latency compute pipelines
  • Profile and tune performance across CPUs, GPUs, memory, storage, and networking
  • Identify and eliminate bottlenecks in data movement and computation
  • Work directly with hardware and OS configuration to achieve deterministic, repeatable performance
  • Configure and tune Linux systems for high-performance workloads
  • Customize and tune Linux kernel parameters (scheduler, NUMA, IRQs, huge pages, IOMMU, etc.)
  • Tune CPU and BIOS parameters (power states, frequency scaling, SMT, NUMA, memory timing)
  • Manage and optimize DMA paths between devices and system memory
  • Minimize context switches, cache misses, and system jitter
  • Develop and optimize GPU-accelerated compute pipelines using CUDA
  • Optimize memory transfers between host and GPU (pinned memory, zero-copy, GPUDirect where applicable)
  • Tune kernel launches, memory access patterns, and occupancy
  • Configure and manage GPU drivers, runtime, and system-level settings for maximum throughput
  • Profile GPU workloads using tools such as Nsight Systems and Nsight Compute
  • Optimize high-speed data ingestion and offload to HPC systems
  • Work with low-latency and high-bandwidth networking technologies (e.g., RDMA, InfiniBand, high-speed Ethernet)
  • Minimize data transfer latencies across network, PCIe, and memory boundaries
  • Design zero-copy or near-zero-copy data paths where possible
  • Implement and optimize digital signal processing algorithms, including:
  • FFTs
  • Deconvolution
  • Thresholding and detection algorithms
  • Optimize DSP workloads for CPU vectorization and GPU acceleration
  • Balance numerical accuracy, latency, and throughput constraints

Benefits

  • Competitive compensation and equity package, comprehensive benefits, and flexibility to support work-life integration.
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service