Senior Storage Performance Engineer

Nvidia•Santa Clara, CA

38d

About The Position

NVIDIA is in search of a highly skilled Senior Storage Performance Engineer to join our ambitious team in Santa Clara, CA. This role is essential as we continue to push the boundaries of AI and HPC technologies. You will have the chance to create, implement, and analyze complex benchmarks to optimize performance across NVIDIA's infrastructure stack. Your efforts will directly impact the efficiency and success of our AI inference and training, NVIDIA NIMs, RAG pipelines, HPC codes, and storage platforms, contributing significantly to our innovative journey.

Requirements

12+ years of experience in performance engineering, benchmarking, or HPC/AI systems.
Deep expertise in AI/ML and deep learning frameworks (PyTorch, TensorFlow, Triton).
Strong background in storage systems and filesystems.
Proven experience with MPI, OpenMP, and Slurm in large-scale compute environments.
Proficiency in Python, Bash, and automation frameworks for job orchestration and results parsing.
Excellent communication skills; ability to context-switch between deep technical work and high-level business impact.
BS, MS, or PhD or equivalent experience in Computer Science, Electrical Engineering, or related field.

Nice To Haves

Experience with RAG pipelines and vector databases (FAISS, Milvus, Qdrant).
Familiarity with Kubernetes and CSI-based persistent storage systems.
Knowledge of GPU profiling tools (Nsight Systems, PyTorch Profiler).
Experience with telemetry/monitoring frameworks (Prometheus, Grafana).
Enthusiastic about exploring the boundaries of AI, HPC, and storage capabilities!

Responsibilities

Crafting and delivering performance benchmarks across AI, HPC, and enterprise storage platforms.
Testing and benchmarking storage appliances (block, file, object) against NVIDIA data center solutions.
Operating and adjusting AI inference and training workloads with tools like PyTorch, TensorFlow, and NVIDIA NIMs.
Benchmarking and analyzing retrieval-augmented generation (RAG) pipelines, including ingestion, retrieval, and inference performance with vector databases.
Profiling and optimizing MPI-based and multi-node distributed applications.
Collaborating closely with product managers, system architects, and partners to fine-tune hardware/software stack performance.