DL Performance Software Engineer - LLM Inference

NvidiaSanta Clara, CA
91d$120,000 - $189,750

About The Position

At NVIDIA, we believe artificial intelligence (AI) will fundamentally transform how people live and work. Our mission is to advance AI research and development to create groundbreaking technologies that enable anyone to harness the power of AI and benefit from its potential. Our team consists of experts in AI, systems and performance optimization. Our leadership includes world-renowned experts in AI systems who have received multiple academic and industry research awards. As a member of the LLM inference team you will help build innovative software with the goals of enabling LLM inference to be more efficient, scalable, and accessible. Are you interested in architecting and implementing the best inference stacks in the LLM world? Work and collaborate with a diverse set of teams involving resource orchestration, distributed systems, inference engine optimization, and writing high performance GPU kernels. Come join our team and contribute towards pioneering accelerated computing and AI.

Requirements

  • Bachelor's degree in Computer Science, Computer Engineering, relevant technical field, or equivalent experience.
  • Strong coding skills in Python and C/C++.
  • 2+ years of industry experience in software engineering or equivalent research experience.
  • Knowledgeable and passionate about machine learning and performance engineering.
  • Proven project experiences in building software where performance is one of its core offerings.

Nice To Haves

  • Solid fundamentals in machine learning, deep learning, operating systems, computer architecture and parallel programming.
  • Research experience in systems or machine learning.
  • Project experience in modern DL software such as PyTorch, CUDA, vLLM, SGLang, and TensorRT-LLM.
  • Experience with performance modelling, profiling, debug, and code optimization or architectural knowledge of CPU and GPU.

Responsibilities

  • Write safe, scalable, modular, and high-quality (C++/Python) code for our core backend software for LLM inference.
  • Perform benchmarking, profiling, and system-level programming for GPU applications.
  • Provide code reviews, design docs, and tutorials to facilitate collaboration among the team.
  • Conduct unit tests and performance tests for different stages of the inference pipeline.

Benefits

  • Equity and benefits.

Stand Out From the Crowd

Upload your resume and get instant feedback on how well it matches this job.

Upload and Match Resume

What This Job Offers

Job Type

Full-time

Career Level

Mid Level

Industry

Computer and Electronic Product Manufacturing

Education Level

Bachelor's degree

© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service