About The Position

Intelligent machines powered by Artificial Intelligence computers that can learn, reason and interact with people are no longer science fiction. GPU Deep Learning has provided the foundation for machines to learn, perceive, reason and solve problems. Today, visual computing is a crucial tool in helping people get along with technology, and NVIDIA has extended its technology into datacenters, mobile devices and cars. There has never been a more exciting time to join our team - if this role sounds like a fit for you, we'd love to hear from you! NVIDIA is seeking a Technical Marketing Engineer to join our Ethernet Networking team to keep improving our performance leadership in AI. In this pivotal role, you will be the hands-on expert for our Spectrum-X Ethernet platform, showcasing its superiority for emerging AI use cases. You will develop and implement rigorous benchmarks on various GPU clusters, analyzing everything from LLM training to groundbreaking inference workloads. Your primary mission is to translate these performance results into compelling technical content, including white papers, blogs, and presentations, that clearly articulates why NVIDIA's Spectrum-X Ethernet solutions are the definitive choice for modern AI infrastructure.

Requirements

  • B.Sc in Computer Science or Software Engineering or equivalent experience
  • 5+ years of experience benchmarking and analyzing high‑performance networking solutions, including RDMA, MPI, and large‑scale collective communication frameworks.
  • Hands‑on expertise in testing and benchmarking deep learning workloads on NVIDIA GPUs with CUDA, TensorFlow, and PyTorch, focused on validating and demonstrating distributed training and inference performance over NCCL, RoCE, and RDMA.
  • Shown proficiency in Performance Analysis methodologies and techniques.
  • Understanding of Ethernet and high-performance networking.
  • Programming experience with Python, Bash and C languages.
  • Experience with distributed job orchestration (Slurm, Kubernetes).
  • Experience with Linux OS distros.
  • Fast and self-learning capabilities with strong analytical and problem-solving skills.
  • In-depth knowledge and experience with AI workloads and benchmarking for large-scale distributed training/inference systems.

Nice To Haves

  • Strong Performance Analysis skills and methodologies using modern tools.
  • Deep knowledge in AI/Data Center Ethernet networks protocols and best-practices (Clos fabrics, BGP, VXLAN, etc.).
  • Hands-on experience with automation, CI/CD pipelines and DevOps practices.
  • Expertise in AI fabrics telemetry including metrics capturing and analysis as well as telemetry tools such as Prometheus and Grafana.
  • In-depth System knowledge and understanding (Intel / AMD / ARM CPUs, NVIDIA GPUs, NIC, Memory, PCI)

Responsibilities

  • Design and execute performance benchmarks using industry-standard tools (e.g., MLPerf, UCX, NVIDIA Collective Communications Library - NCCL and CloudAI) and customer-representative AI workloads on our state-of-the-art GPU clusters.
  • Translate your benchmark data and technical insights into compelling, high-impact marketing assets and performance-driven sales enablement materials
  • Collaborate closely with Product Management, ASIC and Software architecture and Sales teams, provide feedback on product features, and ensure our performance results are technically accurate and impactful
  • Drive the performance characterization of complex training and inference workloads on world-class AI supercomputers, develop rigorous metrics to isolate bottlenecks and guide optimization across the full silicon-to-software stack
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service