Staff Systems Software Engineer, Linux Kernel

CrusoeSan Francisco, CA
20hOnsite

About The Position

Crusoe is seeking a Staff Linux Kernel Engineer to serve as the primary technical authority for our operating system and hardware-interface layer. This is a "heavy-lifting" engineering role designed for someone who views the Linux kernel not just as an OS, but as a programmable substrate for AI performance. As a Staff Engineer, you will own the most critical paths of our compute stack. You will be responsible for upstream-quality development, custom kernel modules, and the low-level orchestration of memory and I/O that allows our GPU clusters to operate at the theoretical limits of the silicon. You will bridge the gap between baremetal hardware and the virtualization layer, ensuring that Crusoe's "Metal-as-a-Service" offering is the most performant in the industry.

Requirements

  • 8+ years of deep Systems Programming experience, with at least 5 years focused specifically on Linux Kernel development.
  • Profound understanding of kernel internals, including the VFS, block layer, task scheduling, and interrupt handling.
  • C & Assembly Proficiency: Expert-level mastery of C and the ability to read/debug architecture-specific assembly (x86_64, ARM64).
  • Virtualization-at-the-Metal: Extensive experience with KVM internals and the interaction between the hypervisor and the host kernel.
  • Deep knowledge of SLAB/SLUB allocators, page table management, and NUMA-aware memory allocation strategies.
  • Expertise in eBPF for both observability and networking/security (XDP) applications.
  • Experience leading large-scale architectural shifts and mentoring senior engineers on the nuances of systems-level safety and performance.

Responsibilities

  • Core Kernel Architecture & Development Architect and implement enhancements to the Linux kernel’s memory management (MM), process scheduler, and I/O stack specifically for high-tenancy AI/HPC workloads.
  • Develop and maintain out-of-tree kernel modules and drivers that manage high-speed interconnects (NVSwitch/NVLink) and hardware accelerators.
  • Identify, backport, and contribute fixes and features to the mainline Linux kernel, ensuring Crusoe remains at the forefront of kernel innovation.
  • Hardware-Software Co-Design PCIe & IOMMU Interfacing: Lead the implementation of VFIO and SR-IOV strategies to provide secure, near-zero-latency hardware passthrough to virtualized environments.
  • DMA & GPUDirect RDMA: Optimize DMA mapping and memory pinning strategies to facilitate high-speed data transfers between NICs and GPUs without CPU intervention.
  • Hardware Abstraction: Work with hardware vendors to debug and influence the design of firmware and silicon-level features that impact kernel stability and performance.
  • Performance Observability & Root-Cause Analysis Advanced Profiling: Utilize eBPF, ftrace, and perf to build deep observability into kernel-space bottlenecks and latency spikes.
  • Deep-System Debugging: Lead the investigation into complex system-level failures, including kernel panics, memory leaks, and non-deterministic hardware behavior.
  • Benchmarking: Establish gold-standard performance metrics for kernel-level operations that directly impact AI training times and inference throughput.

Benefits

  • Industry competitive pay
  • Restricted Stock Units in a fast-growing, well-funded technology company
  • Health insurance package options (HDHP and PPO, vision, and dental)
  • Employer contributions to HSA accounts
  • Paid Parental Leave & Life Insurance
  • 401(k) with a 100% match up to 4% of salary
  • Generous paid time off and holiday schedule
  • $300/month commuter benefit and tuition reimbursement
  • Subscription to the Calm app and MetLife Legal
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service