Staff Embedded Software Engineer

Relativity SpaceLong Beach, CA
1d

About The Position

Own the complete storage platform software stack for a space-based data center: custom Linux kernel drivers, OpenZFS pool design, NFS data serving, and automated fault recovery, shipping a platform that preserves up to a petabyte of mission data through years of radiation exposure Design and implement custom Linux kernel drivers for NVMe fault recovery and GPIO overcurrent protection, working across PCI/PCIe, block layer, and interrupt subsystems to detect and recover from radiation-induced upsets without data loss Lead the ZFS pool topology architectural decisions by building quantitative reliability models that balance upset probability, resilver risk, and capacity over a 6+ year mission, then validate through fault injection testing Develop the integration layer between NVMe controller reset and ZFS, ensuring that a drive recovering from a transient fault re-enters the storage pool cleanly, bridging driver-level recovery with filesystem-level fault tolerance Rapidly prototype on commodity hardware, from first boot through sustained 10 Gbps writes with automated fault recovery, de-risking the architecture before committing to the target platform, then carry the design through integration and launch

Requirements

  • 5+ years writing Linux kernel code, actual driver development involving PCI/PCIe devices, block storage, or interrupt-driven hardware, with meaningful time spent in kernel space
  • Experience with storage systems: ZFS or other copy-on-write filesystems, RAID, NVMe internals, or high-throughput network storage (e.g., NFS)
  • Depth in one or more: filesystem internals, block layer / device management, or storage protocol implementation
  • Strong working knowledge of OS internals: virtual memory, interrupt context constraints, synchronization primitives, and I/O stack behavior

Nice To Haves

  • Hands-on experience at the driver hardware software boundary: DMA coherency, MMIO semantics, PCIe enumeration, and cache behavior
  • Strong working knowledge of data structures and systems reasoning for storage (Merkle trees, NVMe submission/completion queue ring buffers, hash tables, radix trees)
  • Experience testing storage systems, including fault injection (PCIe/NVMe resets, error storms), low-level tracing (ftrace/perf/bpftrace), and crash dump analysis (kdump/vmcore)
  • Experience designing software recovery around storage hardware fault cases, whether that's storage firmware, autonomous vehicle data systems, large-scale distributed infrastructure, or embedded platforms
  • Familiarity with embedded Linux build systems (Yocto or Buildroot) and cross-compilation
  • Hardware lab comfort: serial consoles, logic analyzers, and willingness to debug PCIe enumeration failures on a prototype board alongside the electrical engineers

Responsibilities

  • Own the complete storage platform software stack for a space-based data center: custom Linux kernel drivers, OpenZFS pool design, NFS data serving, and automated fault recovery, shipping a platform that preserves up to a petabyte of mission data through years of radiation exposure
  • Design and implement custom Linux kernel drivers for NVMe fault recovery and GPIO overcurrent protection, working across PCI/PCIe, block layer, and interrupt subsystems to detect and recover from radiation-induced upsets without data loss
  • Lead the ZFS pool topology architectural decisions by building quantitative reliability models that balance upset probability, resilver risk, and capacity over a 6+ year mission, then validate through fault injection testing
  • Develop the integration layer between NVMe controller reset and ZFS, ensuring that a drive recovering from a transient fault re-enters the storage pool cleanly, bridging driver-level recovery with filesystem-level fault tolerance
  • Rapidly prototype on commodity hardware, from first boot through sustained 10 Gbps writes with automated fault recovery, de-risking the architecture before committing to the target platform, then carry the design through integration and launch

Benefits

  • Relativity Space offers competitive salary and equity, a generous PTO and sick leave policy, parental leave, an annual learning and development stipend, and more!
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service