Senior Staff Machine Learning Software Engineer

Foresite Labs (Stealth Co)San Diego, CA
Onsite

About The Position

We are building a product where learned models and compute-heavy inference components have to run inside a tight local runtime budget. Research code is only the starting point. This role owns the path from a working prototype to production inference that is measured, packaged, tested, and ready for repeated use in the field. You will work closely with the people developing the underlying algorithms, but your ownership is different: production readiness, performance, reliability, and the engineering boundary between exploratory model work and shipped execution. The strongest fit is someone who can explain the bottleneck they found, the number they moved, the tradeoff they accepted, and the test that kept the fix from regressing. If your best work is making inference faster, smaller, more predictable, and easier to ship, this role is likely a good match.

Requirements

  • A PhD (6+ years), MS (10+ years) or BS/BA (12+ years) of experience in life sciences or technology.
  • Demonstrated leadership or ownership with 2 of the 5 areas referenced below successfully:
  • Shipped constrained inference. You have personally moved a model or learned component from prototype to deployed runtime with a real latency, throughput, memory, or power budget. You can name the target, the bottleneck, and the change that closed the gap.
  • Rust/C++ at shipping depth. You have written production code in Rust or modern C++ where correctness, latency, memory layout, and ownership boundaries mattered. You can reason about the runtime behavior of the code you ship, not just its API surface.
  • CUDA and accelerator-aware execution. You are comfortable below Python: custom CUDA extensions or kernels, host/device memory movement, launch overhead, profiler traces, and the practical tradeoffs between framework convenience and a purpose-built implementation.
  • Performance-native judgment. You reason in wall-clock time, memory movement, launch overhead, bandwidth, numerical precision, and error budgets without needing those constraints added late in review.
  • Production engineering discipline. You define typed interfaces, deterministic behavior, reproducible artifacts, meaningful tests, and clean handoffs with upstream research code.

Nice To Haves

  • Rust at shipping depth, especially FFI boundaries, pyo3 / maturin, async runtimes, or performance-sensitive service code
  • Inference on constrained local hardware, embedded systems, edge devices, or budget-bound accelerator deployments
  • Quantization, mixed precision, model compression, or kernel fusion that shipped beyond a benchmark notebook
  • Calibration or confidence estimation used on production outputs, with monitoring or regression checks attached
  • Public or shareable evidence of engineering quality: code, technical writing, postmortems, talks, or a concrete shipped system you can discuss
  • Comfort using AI-assisted development tools while still owning correctness, tests, and review quality
  • Real-time or near-real-time signal-processing systems
  • Products that combine learned models with deterministic numerical code
  • Rust- or C++-based inference or numerical pipelines, including custom FFI to CUDA, cuDNN, TensorRT, or similar accelerator libraries

Responsibilities

  • Turning research prototypes into production inference components with explicit latency, throughput, memory, and accuracy budgets
  • Optimizing the execution path: tensor layout, host/device transfers, batching strategy, kernel launch overhead, mixed precision, quantization, and memory reuse
  • Writing or tuning Rust, C++, and CUDA where framework-level optimization is not enough, then validating the improvement with profiler output and release-facing tests
  • Building inference-adjacent evaluation machinery: calibration checks, confidence behavior, regression detection, dataset slices, and failure-mode reporting tied to product metrics
  • Maintaining the deployment contract: model artifacts, runtime integration, versioning, reproducibility, and performance gates that block unsafe changes
  • Collaborating with algorithm research and novel model design teams, translating prototypes into production constraints, and surfacing shipping risks early when a design needs to change.
© 2026 Teal Labs, Inc
Privacy PolicyTerms of Service