Principal Engineer, High Performance Data & Algorithm Infrastructure

Foresite Labs (Stealth Co)•San Diego, CA

2d•$258,000 - $275,000•Onsite

About The Position

We are looking for a Principal Engineer to architect, build, and own the end-to-end data pipeline that drives our high-throughput diagnostic instrument platform — from real-time image acquisition on the instrument, through GPU-accelerated signal processing, to offloading for secondary and tertiary analysis on local HPC clusters and cloud infrastructure. This is a technical leadership role for an engineer who can design and deliver industrial-grade data processing infrastructure that operates reliably at sustained high throughput. You will be responsible for the full data path: acquiring raw image data from sensors, processing it through GPU pipelines, orchestrating job distribution across local HPC and cloud compute, and ensuring the entire system handles errors, backpressure, and recovery gracefully. The scope spans instrument- embedded software, on-premises Linux HPC infrastructure, and cloud- based compute and storage. The central challenge of this role is not raw compute optimization — GPU and CPU resources will have adequate headroom. The challenge is building a pipeline architecture that is robust, scalable, and evolvable as instrument throughput increases with each generation, the number of instruments grows, and data volumes scale accordingly. You will design systems that keep a complex multi-stage pipeline running continuously and reliably in a production lab environment, and that can be evolved without wholesale re-architecture as requirements intensify.

Requirements

12+ years of professional software engineering experience in performance-critical systems
Track record of architecting and delivering complex, multi-stage data processing pipelines
Demonstrated technical leadership — ability to drive architecture decisions and mentor engineers
Experience operating systems at industrial-grade reliability and throughput requirements
Expert-level C/C++ and systems programming on Linux
Solid experience with CUDA programming and GPU pipeline development (required)
Strong understanding of computer architecture: CPU caches, NUMA, memory hierarchies, PCIe, DMA
Experience with Python for tooling, orchestration, and pipeline glue
Experience with performance profiling and diagnostics tools (perf, ftrace, Nsight, or similar)
Experience designing multi-stage data pipelines with flow control, buffering, and backpressure management
Strong understanding of error handling, retry strategies, and fault recovery in performance-critical systems
Experience with job scheduling and work distribution across heterogeneous compute resources
Practical experience implementing DSP or image processing algorithms in production systems
Familiarity with frequency-domain analysis, filtering, and detection algorithms
Ability to reason about numerical accuracy and throughput tradeoffs
Experience optimizing data transfer across high-speed networks (RDMA, InfiniBand, high-speed Ethernet)
Understanding of shared storage architectures, tiered storagestrategies, and high- throughput data staging
Experience defining compute platform requirements and collaborating effectively with infrastructure teams
Familiarity with algorithm deployment and versioning in production compute environments
BS/MS in Computer Science, Electrical Engineering, or related field.

Nice To Haves

Experience with high-throughput diagnostic instrument, imaging, or scientific instrument data pipelines
Experience scaling a data pipeline through multiple hardware or throughput generations
Experience with GPUDirect RDMA or other hardware offload technologies
Familiarity with real-time or low-latency Linux variants
Background in scientific computing, computational physics, or bioinformatics
Experience designing systems that span embedded instrument software and datacenter infrastructure
PhD preferred.
Familiarity with workflow orchestration frameworks (Airflow, Celery, custom solutions, or similar) is a plus

Responsibilities

End-to-End Data Pipeline Architecture
Own the architecture of the complete data path from image acquisition to final processed output
Design pipeline stages with clear interfaces, flow control, and backpressure mechanisms
Ensure the pipeline sustains continuous high-throughput operation across extended instrument runs
Define data formats, handoff protocols, and buffering strategies between pipeline stages
Architect for graceful degradation — the system must handle transient failures without data loss or pipeline stalls
Establish performance budgets and SLAs for each pipeline stage and monitor adherence
Image Acquisition & On-Instrument Processing
Develop and optimize real-time image acquisition from high-speed sensors on the instrument
Implement low-latency, high-bandwidth data capture with minimal frame loss
Design on-instrument preprocessing stages that reduce data volume before offload
Manage memory and storage constraints within the instrument compute environment
Ensure deterministic, repeatable performance under sustained acquisition loads
GPU-Accelerated Signal & Image Processing
Develop and maintain GPU compute pipelines using CUDA for signal and image processing
Implement DSP algorithms including frequency-domain analysis, deconvolution, filtering, and detection
Manage host-to-GPU data transfers and ensure efficient use of GPU resources
Profile GPU workloads to identify issues and validate performance headroom
Balance numerical accuracy against throughput requirements
Job Orchestration & Distributed Processing
Design and implement job queuing, scheduling, and orchestration across instrument, local HPC, and cloud compute
Build robust work distribution that maximizes resource utilization across heterogeneous compute
Implement backpressure handling so upstream stages throttle gracefully when downstream is saturated
Design comprehensive error handling, retry logic, and dead-letter strategies for failed jobs
Ensure jobs are idempotent and recoverable — partial failures must not corrupt the pipeline
Implement priority scheduling to balance real-time instrument processing with batch reprocessing
Monitor queue depths, processing latencies, and resource utilization with actionable alerting
Linux Systems & Performance
Configure and tune Linux systems for reliable, high-throughput operation across instrument and HPC nodes
Tune kernel parameters (scheduler, NUMA, IRQs, huge pages) as needed for stable pipeline performance
Understand and manage DMA paths, PCIe topology, and device-to- memory data movement
Profile and diagnose system-level issues using perf, ftrace, eBPF, and similar tools
Ensure system configurations are reproducible and documented across instrument and HPC environments
HPC Compute Platform & Algorithm Infrastructure (co- owned with DevOps)
Co-design the HPC compute platform architecture with DevOps — define computational requirements, job flow, and data access patterns while DevOps provisions and manages the infrastructure
Define how algorithms are deployed, versioned, and rolled into production on the HPC platform — support safe side-by-side execution of new and existing algorithm versions
Design compute allocation strategies that balance real-time instrument processing, batch algorithm development/validation, and historical data reprocessing
Design the data handoff between instrument-side processing and HPC/cloud compute — formats, staging, transfer protocols
Define storage tiering requirements for the processing pipeline — what data stays hot for active processing, what moves to warm for algorithm development access, and what archives to cold
Specify when and how workloads should burst from local HPC to cloud (AWS) based on pipeline load and priority
Optimize data movement across high-speed networks (RDMA, InfiniBand, high-speed Ethernet) between instrument, HPC, and storage
Design for scalability — the architecture must accommodate increasing instrument throughput, additional instruments, and growing algorithm complexity
Reliability & Observability
Instrument every pipeline stage with metrics, logging, and tracing
Build real-time dashboards showing pipeline health, throughput, latency, and queue state
Design automated recovery mechanisms for common failure modes
Implement data integrity checks and validation at pipeline stage boundaries
Support root-cause analysis and post-mortem investigation for pipeline incidents
Establish runbooks and operational procedures for pipeline operations