Senior Deep Learning Software Engineer

NVIDIARedmond, CA
4dHybrid

About The Position

We are looking for a Senior Deep Learning Software Engineer to design and build our automated inference and deployment solution. As part of the team, you will be instrumental in defining a scalable architecture for DL inference with emphasis on ease-of-use and compute efficiency. Your work will span multiple layers of the DL deployment stack, encompassing developing features in high-level frameworks like PyTorch and JAX, designing and implementing a high-performance execution environment, low-level GPU optimizations and developing custom GPU kernels in CUDA and/or Triton. This is an exceptional opportunity for passionate software engineers straddling the boundaries of research and engineering, with a strong background in both machine learning fundamentals and software architecture & engineering. Increasingly known as “the AI computing company” and widely considered to be one of the technology world’s most desirable employers . Are you creative, motivated, and love a challenge? If so, we want to hear from you! Come, join our model optimization group, where you can help build real-time, cost-effective computing platforms driving our success in this exciting and rapidly-growing field.

Requirements

  • Masters, PhD, or equivalent experience in Computer Science, AI, Applied Math, or related field.
  • 8+ years of relevant work or research experience in Deep Learning.
  • Excellent software design skills, including debugging, performance analysis, and test design.
  • Strong proficiency in Python, PyTorch, and related ML tools.
  • Strong algorithms and programming fundamentals.
  • Good written and verbal communication skills and the ability to work independently and collaboratively in a fast-paced environment.

Nice To Haves

  • Contributions to PyTorch, JAX, or other Machine Learning Frameworks.
  • Knowledge of GPU architecture and compilation stack, and capability of understanding and debugging end-to-end performance.
  • Familiarity with NVIDIA's deep learning SDKs such as TensorRT.
  • Prior experience in writing high-performance GPU kernels for machine learning workloads in frameworks such as CUDA, CUTLASS, or Triton.

Responsibilities

  • Play a pivotal role in defining of a modular, scalable platform to seamlessly bridge training and deployment workflows—enabling tight integration of deployment tooling with training frameworks such as Megatron and Nemo
  • Leverage and build upon the torch 2.0 ecosystem (TorchDynamo, torch.export, torch.compile, etc...) to analyze and extract standardized model graph representation from arbitrary torch models for our automated deployment solution.
  • Develop support for inference optimization techniques such as speculative decoding and LoRA.
  • Collaborate with teams across NVIDIA to use performant kernel implementations within the automated deployment solution.
  • Analyze and profile GPU kernel-level performance to identify hardware and software optimization opportunities.
  • Continuously innovate on the inference performance to ensure NVIDIA's inference software solutions (TRT, TRT-LLM, TRT Model Optimizer) can maintain and increase its leadership in the market.

Benefits

  • equity
  • benefits
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service