NVIDIA is at the forefront of the generative AI revolution. We are looking for a Software Engineer, Performance Analysis, and Optimization for LLM Inference, to join our performance engineering team. In this role, you will focus on improving the efficiency and scalability of large language model (LLM) inference on NVIDIA Computing Platforms through compiler and kernel-level analysis and optimizations. You will work on key components that span IR-based compiler optimization, graph-level transformations, and precompiled kernel performance tuning to deliver innovative inference speed and efficiency. As a core contributor, you will collaborate with groups passionate about compiler, kernel, hardware, and framework development. You will analyze performance bottlenecks, develop new optimization passes, and validate gains through profiling and projection tools. Your work will directly influence the runtime behavior and hardware utilization of next-generation LLMs deployed across NVIDIA's data center and embedded platforms.
Stand Out From the Crowd
Upload your resume and get instant feedback on how well it matches this job.
Job Type
Full-time
Career Level
Entry Level
Number of Employees
5,001-10,000 employees