Senior Developer Technology Engineer - Windows AI Platform

NVIDIA•Santa Clara, CA

19d

About The Position

At NVIDIA, we’re tapping into the unlimited potential of AI to define the next era of computing. An era in which our GPU acts as the brains of computers, robots, and self-driving cars that can understand the world. Doing what’s never been done before takes vision, innovation, and the world’s best talent. As an NVIDIAN, you’ll be immersed in a diverse, supportive environment where everyone is inspired to do their best work. Come join the team and see how you can make a lasting impact on the world. As a Developer Technology Engineer, you will be at the forefront of innovation, working with leading industry partners and exciting OSS projects to help them adopt groundbreaking advancements in AI and accelerated computing on NVIDIA RTX. This role offers an outstanding opportunity to collaborate with world-class talent and make a significant contribution to the next era of enterprise and consumer AI. What you'll be doing: Work closely with internal engineering and product teams and external app developers on solving local end-to-end AI GPU deployment challenges on the NVIDIA RTX AI platform. Apply powerful profiling and debugging tools for analyzing most demanding GPU-accelerated end-to-end AI applications to detect insufficient GPU utilization resulting in suboptimal runtime performance. Conduct hands-on trainings, develop sample code and host presentations to give good guidance on efficient end-to-end AI deployment targeting optimal runtime performance on NVIDIA ARM-based SoCs. Improve Windows LLM & GenAI user experience on NVIDIA RTX by working on feature and performance enhancements of OSS software, including but not limited to projects like GGML, Llama.cpp, Ollama, ONNX Runtime. Collaborate with GPU driver and architecture teams as well as NVIDIA research to influence next generation GPU features by providing real-world workflows and giving feedback on partner and customer needs. Providing technical leadership and mentorship to junior engineers, encouraging an inclusive and high-performing team environment.

Requirements

A proven track record of 8+ years of professional experience in local GPU deployment, profiling and optimization.
Bachelor's or Master's degree or equivalent experience in Computer Science, Engineering, or a related field.
Strong proficiency in C/C++, Python, software design, programming techniques..
Familiarity with and development experience on the Windows operating system.
Experience working with open-source LLM and GenAI software.
Experience with CUDA and NVIDIA's Nsight GPU profiling and debugging suite.
Some travel is required for conferences and for on-site visits with external partners.
Strong problem-solving skills and the ability to work both independently and collaboratively in a fast-paced environment.
Excellent interpersonal and communication skills and a passion for keeping track with the latest advancements in AI technology.

Nice To Haves

Experience with GPU-accelerated AI inference driven by NVIDIA APIs, specifically cuDNN, CUTLASS, TensorRT.
Confirmed expert knowledge in Vulkan and / or DX12.
Detailed knowledge of the latest generation GPU architectures.
Experience with AI deployment on NPUs and ARM architectures.

Responsibilities

Work closely with internal engineering and product teams and external app developers on solving local end-to-end AI GPU deployment challenges on the NVIDIA RTX AI platform.
Apply powerful profiling and debugging tools for analyzing most demanding GPU-accelerated end-to-end AI applications to detect insufficient GPU utilization resulting in suboptimal runtime performance.
Conduct hands-on trainings, develop sample code and host presentations to give good guidance on efficient end-to-end AI deployment targeting optimal runtime performance on NVIDIA ARM-based SoCs.
Improve Windows LLM & GenAI user experience on NVIDIA RTX by working on feature and performance enhancements of OSS software, including but not limited to projects like GGML, Llama.cpp, Ollama, ONNX Runtime.
Collaborate with GPU driver and architecture teams as well as NVIDIA research to influence next generation GPU features by providing real-world workflows and giving feedback on partner and customer needs.
Providing technical leadership and mentorship to junior engineers, encouraging an inclusive and high-performing team environment.