About The Position

TensorWave is building and operating large-scale infrastructure platforms to support high-performance AI and machine learning workloads across multiple data centers. Their environment includes GPU-intensive systems, high-throughput networking, and distributed storage platforms designed for performance, scale, and resilience. They are looking for a Storage Operations Engineer to own the day-to-day operation, performance, and reliability of their storage platforms. This role is responsible for ensuring that storage systems remain stable, performant, and aligned with the demands of Kubernetes, AI/ML, and high-performance compute workloads. This is not a traditional SAN/NAS administration role. The engineer will work with modern distributed storage systems and be expected to troubleshoot, optimize, and scale them in production environments.

Requirements

  • 4–7+ years of experience in infrastructure, systems, or storage operations
  • Strong hands-on experience operating distributed storage systems in production
  • Experience with Ceph (RBD, CephFS, or RGW)
  • Experience with modern storage platforms such as: Weka, VAST Data, or similar high-performance systems
  • Strong Linux systems knowledge
  • Solid understanding of storage performance characteristics (IOPS, throughput, latency)
  • Solid understanding of data replication and failure domains
  • Ability to troubleshoot across storage systems
  • Ability to troubleshoot across network paths
  • Ability to troubleshoot across compute clients

Nice To Haves

  • Experience supporting AI/ML or HPC workloads
  • Familiarity with NVMe-based storage architectures
  • Familiarity with RDMA or high-throughput Ethernet environments
  • Experience integrating storage with Kubernetes
  • Experience operating storage across multiple data centers
  • Exposure to object storage and S3-compatible APIs

Responsibilities

  • Operate and maintain distributed storage platforms, including Ceph (RBD, CephFS, RGW), High-performance NAS platforms (e.g., Weka, VAST Data)
  • Manage storage lifecycle operations - cluster expansion, upgrades and migrations
  • Monitor and maintain storage health, including capacity utilization, data distribution and balance, cluster state and recovery operations
  • Analyze and troubleshoot storage performance across IOPS, throughput, and latency (including tail latency)
  • Identify and remediate bottlenecks across disk subsystems, network paths (including RDMA where applicable), client access patterns
  • Support incident response and root cause analysis for storage-related issues
  • Ensure storage platforms meet performance expectations for GPU and Kubernetes workloads
  • Operate and support Kubernetes-integrated storage - CSI drivers, StorageClasses, PersistentVolumes / PersistentVolumeClaims
  • Troubleshoot storage-related issues in Kubernetes environments, including stateful workloads, performance inconsistencies, scheduling and provisioning failures
  • Execute and improve automation for storage deployment and operations using Ansible, Terraform, Kubernetes manifests / Helm
  • Contribute to improving monitoring and alerting, operational workflows, runbooks and documentation
  • Partner with DevOps and Platform Engineering (automation and orchestration), Network Engineering (high-throughput and RDMA networking), Compute / Virtualization teams
  • Help ensure end-to-end performance across compute, network, and storage layers

Benefits

  • Stock Options
  • 100% paid Medical, Dental, and Vision insurance for Employees
  • Company Health Savings Account Contributions
  • 100% paid Short Term and Long Term Disability Insurance for Employees
  • Life and Voluntary Supplemental Insurance Options
  • Other Insurance Options, such as Pet & Legal Insurance
  • Various Supplementary Health Benefits, such as discounted Virtual Healthcare Appointments and Serious Illness Support
  • Flexible Spending Account
  • 401(k)
  • Employee Assistance Program
  • Flexible PTO
  • Paid Holidays
  • Parental Leave
  • Other In-Office Perks
© 2026 Teal Labs, Inc
Privacy PolicyTerms of Service