About The Position

The Senior Lead Storage and Server Test Engineer will play a pivotal role in the design, development, and execution of comprehensive test strategies for our AI data center's storage and server infrastructure. This leadership position requires deep expertise in enterprise storage systems, server architectures, networking, and a strong understanding of the unique performance and reliability demands of AI/ML workloads. The ideal candidate will be a hands-on technical leader, capable of mentoring junior engineers, driving test automation, and collaborating across engineering teams to deliver robust and high-performing solutions.

Requirements

  • Bachelor's or Master's degree in Computer Science, Electrical Engineering, or a related technical field.
  • 7+ years of experience in hardware and/or software testing, with at least 5 years focused on enterprise-level storage and server systems.
  • 3+ years of experience in a lead or senior technical role, mentoring junior engineers or leading test initiatives.
  • Proven experience in a lead or senior technical role, mentoring and guiding other engineers.
  • Deep expertise in various storage technologies including NVMe, SAS/SATA SSDs/HDDs, RAID, distributed file systems (e.g., Ceph, Lustre, GPFS), SAN, and NAS.
  • Strong understanding of server architectures (x86, ARM, GPU servers), CPU/memory subsystems, PCIe, and power management.
  • Strong understanding of Baseband Management Controllers (BMC) functionality.
  • Proficiency in scripting languages (e.g., Python, Bash) for test automation and data analysis.
  • Experience with Linux operating systems (e.g., Ubuntu, CentOS, RHEL) and command-line tools.
  • Familiarity with networking concepts (Ethernet, TCP/IP, InfiniBand) and network testing methodologies.
  • Experience with test methodologies such as performance testing, reliability testing, stress testing, and fault injection.
  • Excellent problem-solving, analytical, and debugging skills.
  • Strong communication and interpersonal skills, with the ability to collaborate effectively across diverse teams.

Nice To Haves

  • Familiarity with OCP (Open Compute Project)
  • Experience with cloud environments (AWS, Azure, GCP) and virtualization technologies.
  • Knowledge of containerization technologies (Docker, Kubernetes).
  • Familiarity with AI/ML frameworks (e.g., TensorFlow, PyTorch) and their infrastructure requirements.
  • Experience with performance profiling tools (e.g., fio, Iometer, Perf, VTune).
  • Contributions to open-source projects related to storage, servers, or testing.
  • Certifications in relevant technologies (e.g., NetApp, Dell EMC, HPE, NVIDIA).

Stand Out From the Crowd

Upload your resume and get instant feedback on how well it matches this job.

Upload and Match Resume

What This Job Offers

Career Level

Senior

Industry

Computer and Electronic Product Manufacturing

Education Level

Bachelor's degree

Number of Employees

5,001-10,000 employees

© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service