About The Position

This is an incredible opportunity to be part of a company that has been at the forefront of AI and high-performance data storage innovation for over two decades. DataDirect Networks (DDN) is a global market leader renowned for powering many of the world's most demanding AI data centers, in industries ranging from life sciences and healthcare to financial services, autonomous cars, Government, academia, research and manufacturing. DDN's A3I solutions are transforming the landscape of AI infrastructure. DDN is the global leader in AI and multi-cloud data management at scale. Our cutting-edge data intelligence platform is designed to accelerate AI workloads, enabling organizations to extract maximum value from their data. With a proven track record of performance, reliability, and scalability, DDN empowers businesses to tackle the most challenging AI and data-intensive workloads with confidence. Our success is driven by our unwavering commitment to innovation, customer-centricity, and a team of passionate professionals who bring their expertise and dedication to every project. This is a chance to make a significant impact at a company that is shaping the future of AI and data management. Our commitment to innovation, customer success, and market leadership makes this an exciting and rewarding role for a driven professional looking to make a lasting impact in the world of AI and data storage.

Requirements

  • 7+ years of experience in performance engineering, benchmarking, or HPC/AI systems.
  • Deep experience with AI/ML and deep learning frameworks (PyTorch, TensorFlow, ONNX, Triton).
  • Familiarity with NVIDIA NIMs and containerized model serving stacks.
  • Proven expertise with MPI, OpenMP, Slurm or similar schedulers in large-scale compute environments.
  • Solid understanding of file and storage systems (e.g., POSIX, Lustre, S3, NVMe-oF).
  • Strong Linux skills (debugging, tuning, networking, storage stack).
  • Proficiency in scripting (e.g., Bash, Python) for job orchestration and result parsing.
  • Ability to create clear Excel graphs and presentations from raw benchmark data.
  • Strong communication skills — able to convey technical results and trade-offs to engineering and customer-facing teams.

Nice To Haves

  • Experience with RAG pipelines, vector databases (e.g., FAISS, Milvus, Qdrant).
  • Familiarity with Kubernetes and CSI-based persistent volume systems.
  • Understanding of GPU profiling tools (Nsight, nvprof, PyTorch Profiler).
  • Knowledge of telemetry and monitoring frameworks (e.g., Prometheus, Grafana).
  • Prior work publishing or presenting technical performance results.

Responsibilities

  • Design and execute performance benchmarks across AI, HPC, and storage platforms.
  • Run and tune AI inference workloads using frameworks such as PyTorch, TensorFlow, Triton, NVIDIA NIMs, and vector databases.
  • Benchmark large-scale RAG pipelines including data ingestion, retrieval, and inference performance.
  • Profile and optimize MPI and multi-node distributed applications.
  • Compile and debug C/C++, Python, and CUDA-based codes across heterogeneous systems.
  • Generate automated test scripts and benchmarking workflows (e.g., with Bash, Python, or Slurm job scripts).
  • Analyze and visualize results using Excel, Jupyter, or reporting tools; create comparison graphs and KPIs.
  • Write clear, concise performance reports for both technical and non-technical stakeholders.
  • Present findings internally and externally, translating results into architectural guidance for field engineers and sales teams.
  • Collaborate with system engineers, product managers, and partners to tune and improve software/hardware stack performance.
  • Validate and tune performance on storage systems including parallel file systems (e.g., Lustre, GPFS), object storage, and NVMe over Fabrics.
  • Contribute to internal tooling to automate test cycles and performance regression tracking.
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service