Senior Deep Learning Algorithm Engineer

NVIDIA•Santa Clara, CA

16d

About The Position

NVIDIA’s GPU Workload Efficiency (GWE) team is looking for a skilled Senior Engineer to enhance performance in training and inference. We are developing methods to improve the efficiency of AI workloads on NVIDIA GPUs. This position entails collaborating on GPU architecture, deep learning frameworks, and large-scale applications to optimize performance. Come aboard and be a part of a team that spearheads the evolution in AI computing! What you’ll be doing: Evaluating, explaining, and improving deep learning workloads for both training and inference, contributing to advancements in throughput, latency, and efficiency across NVIDIA GPU platforms. Collaborating across NVIDIA with researchers, engineers, and hardware specialists to recognize bottlenecks and achieve performance improvements. Developing production-quality software across the deep learning platform stack, from frameworks to deployment. Building automation and diagnostics that enable reproducible, scalable, and backend-agnostic performance improvements.

Requirements

5+ years of relevant experience in deep learning, high-performance computing, or related fields.
Master’s or PhD in Computer Science, Electrical Engineering, Computer Engineering, or a related field (or equivalent experience).
Extensive background in improving deep learning workloads, showcasing a deep understanding of training and inference constraints.
Proven ability in GPU performance analysis and profiling, with hands-on experience applying advanced optimization techniques.
Solid knowledge of computer architecture and familiarity with the fundamentals of GPU development.
Strong programming skills in Python and C++.

Nice To Haves

Proven track record of analyzing, modeling, and tuning application performance with measurable impact.
Concrete experience in optimizing models in PyTorch for both training and inference tasks.
Developments in performance tooling, profiling infrastructure, or diagnostics that elevated training and inference efficiency.
Background in GPU programming (CUDA or OpenCL) is a strong plus, though not required.

Responsibilities

Evaluating, explaining, and improving deep learning workloads for both training and inference, contributing to advancements in throughput, latency, and efficiency across NVIDIA GPU platforms.
Collaborating across NVIDIA with researchers, engineers, and hardware specialists to recognize bottlenecks and achieve performance improvements.
Developing production-quality software across the deep learning platform stack, from frameworks to deployment.
Building automation and diagnostics that enable reproducible, scalable, and backend-agnostic performance improvements.

Benefits

You will also be eligible for equity and benefits.
Your base salary will be determined based on your location, experience, and the pay of employees in similar positions.
The base salary range is 184,000 USD - 287,500 USD for Level 4, and 224,000 USD - 356,500 USD for Level 5.

Stand Out From the Crowd

Upload your resume and get instant feedback on how well it matches this job.

Upload and Match Resume