Staff Storage Systems Administrator

CrusoeSan Francisco, CA
1d

About The Position

At Crusoe, we are on a mission to align the future of computing with the future of the climate. As a Staff Storage Administrator on the Storage Team, you will be the lead architect and operator of the data layer for our vertically integrated AI cloud. This team sits at the critical intersection of massive-scale data ingress/egress and high-performance GPU workloads, ensuring that our sustainable clusters deliver world-class data throughput for the world’s most demanding AI and HPC use cases. You will manage the end-to-end lifecycle of our world-wide storage environment from initial bring-up and configuration to high-level vendor strategy. In this role, you will have a direct hand in shaping our enterprise infrastructure, collaborating on vendor RFPs and reviewing responses while working directly to influence vendor product roadmaps. Your work ensures that Fortune 500 companies and leading AI researchers have the performant, reliable, and sustainable storage needed to power the AI revolution.

Requirements

  • 10+ years of experience in storage systems administration with a heavy focus on petabyte-scale, on-premise data environments.
  • Strong understanding of storage architectures (block, file, object) and I/O paths.
  • Hands‑on experience with performance benchmarking and observability tools (FIO, ElBencho, blktrace, nvme-cli,nfs-gaze, eBPF, etc.).
  • Experience with SSDs, NVMe, RAID, caching, or distributed storage systems.
  • Deep familiarity with enterprise flash arrays and distributed file systems. Specific experience with VAST Data, Pure Storage (Everpure) is highly preferred.
  • Proficiency with scripting (Python, Go or bash) to automate array management and monitoring.
  • Ability to analyze complex performance data and present clear conclusions.
  • Proven ability to lead the authoring of technical requirements, evaluating RFP responses and managing complex vendor relationships.
  • Experience with system design for specific I/O use cases (AI training/inference) and a disciplined approach to testing and validating new vendor releases.
  • A genuine interest in Crusoe’s mission to reduce the environmental impact of the AI revolution through sustainable infrastructure.

Nice To Haves

  • Experience with RDMA, iSCSI, NVME-oF, RoCEv2 or InfiniBand networking as it relates to high-performance storage.
  • Previous experience at a major Cloud Service Provider (CSP) or a high-scale AI infrastructure company.
  • Familiarity with distributed storage systems (Ceph, Lustre, Gluster, etc.).

Responsibilities

  • Performance Analysis & Optimization: Evaluate performance of block, file, and object storage systems across diverse workloads. Identify bottlenecks at the hardware, firmware, OS, and application layers. Develop and execute performance test plans, benchmarks, and stress tests. Tune storage stacks (I/O schedulers, caching layers, drivers, protocols) to achieve target KPIs.
  • Validation & Testing: Design and execute Proof of Concept (PoC) exercises to take new arrays through their paces. You will validate new vendor software releases in staging environments before rolling them out to our global production footprint.
  • Full-Stack Administration: Own the initial bring-up, configuration, and ongoing performance tuning of large enterprise arrays. You will manage the lifecycle of the storage OS, ensuring all systems are optimized for AI training and inference I/O patterns.
  • Enterprise Infrastructure Building: Collaborate with the Compute and Networking teams to build a seamless "gold standard" cloud infrastructure. You will design cloud-scale storage systems that can excel in high-concurrency, high-throughput environments.
  • Storage Strategy & Selection: Lead the technical evaluation of new storage technologies. You will be responsible for authoring RFPs, reviewing vendor responses, and leading "down selection" processes to ensure we invest in the best hardware for AI workloads.
  • Vendor Roadmap Influence: Serve as the primary technical point of contact for storage partners (such as VAST Data, Pure Storage). You will sit with their engineering teams to provide feedback on bugs, missing features, and prioritize Crusoe’s requirements on their development roadmaps.
  • Cross‑Functional Collaboration: Work closely with service engineering and architecture teams to influence design decisions. Provide performance guidance during feature development and release cycles. Communicate findings to both technical and non‑technical stakeholders.

Benefits

  • Industry competitive pay and Series E Restricted Stock Units
  • Health insurance package options (HDHP and PPO), vision, and dental
  • 401(k) with a 100% match up to 4% of salary
  • Paid Parental Leave, Life Insurance, and Disability
  • Company paid commuter benefit ($300/month)
  • Cell phone and tuition reimbursement
  • Subscription to the Calm app and MetLife Legal
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service