Nvidia-posted about 1 year ago
$148,000 - $276,000/Yr
Full-time • Senior
Santa Clara, CA
Computer and Electronic Product Manufacturing

NVIDIA is seeking a Senior High Performance Computing (HPC) and AI Networking Performance Research and Analysis Engineer to join our Performance group. This role involves profiling and analyzing AI workloads on large GPU and CPU scale clusters for distributed Deep Learning LLM training, with a focus on collective communication and networking. The engineer will develop performance analysis tools and methodologies to understand performance expectations, limitations, and bottlenecks in high-performance networking environments.

  • Exploring and researching AI workloads and DL models for large-scale deep learning LLM training on NVIDIA supercomputers and distributed systems.
  • Benchmarking, profiling, and analyzing performance to identify bottlenecks and areas for improvement, with a focus on networking aspects.
  • Implementing performance analysis tools.
  • Collaborating with teams from hardware to software to provide performance analysis insights.
  • Defining performance test planning and setting performance expectations for new technologies and solutions.
  • B.Sc in Computer Science or Software Engineering or equivalent experience.
  • 5+ years of experience with high-performance Networking (RDMA, MPI, NCCL, Congestion Control Algorithms).
  • Demonstrated performance analysis skills and methodologies.
  • Experience with NVIDIA GPUs, CUDA library, and deep learning frameworks like TensorFlow or PyTorch.
  • Expertise in networking collective communication libraries (such as NCCL) and protocols (such as RoCE and RDMA).
  • Strong analytical and problem-solving skills with fast self-learning capabilities.
  • Proficiency in programming languages: Python, Bash, and C.
  • Experience with Linux OS distros.
  • Good communication and interpersonal skills.
  • In-depth knowledge and experience with AI workloads and benchmarking for distributed LLM training.
  • Knowledge in CUDA and NCCL libraries.
  • Knowledge in Congestion Control algorithms.
  • In-depth system knowledge (Intel / AMD / ARM CPUs, NVIDIA GPUs, HCA, Memory, PCI).
  • Strong performance analysis skills using modern tools.
  • Highly competitive salaries
  • Comprehensive benefits package
  • Equity options
  • Diverse and supportive work environment
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service