Own the complete storage platform software stack for a space-based data center: custom Linux kernel drivers, OpenZFS pool design, NFS data serving, and automated fault recovery, shipping a platform that preserves up to a petabyte of mission data through years of radiation exposure Design and implement custom Linux kernel drivers for NVMe fault recovery and GPIO overcurrent protection, working across PCI/PCIe, block layer, and interrupt subsystems to detect and recover from radiation-induced upsets without data loss Lead the ZFS pool topology architectural decisions by building quantitative reliability models that balance upset probability, resilver risk, and capacity over a 6+ year mission, then validate through fault injection testing Develop the integration layer between NVMe controller reset and ZFS, ensuring that a drive recovering from a transient fault re-enters the storage pool cleanly, bridging driver-level recovery with filesystem-level fault tolerance Rapidly prototype on commodity hardware, from first boot through sustained 10 Gbps writes with automated fault recovery, de-risking the architecture before committing to the target platform, then carry the design through integration and launch
Stand Out From the Crowd
Upload your resume and get instant feedback on how well it matches this job.
Job Type
Full-time
Career Level
Mid Level
Education Level
No Education Listed