About The Position

NVIDIA is at the forefront of the generative AI revolution. We are looking for a Software Engineer, Performance Analysis, and Optimization for LLM Inference, to join our performance engineering team. In this role, you will focus on improving the efficiency and scalability of large language model (LLM) inference on NVIDIA Computing Platforms through compiler and kernel-level analysis and optimizations. You will work on key components that span IR-based compiler optimization, graph-level transformations, and precompiled kernel performance tuning to deliver innovative inference speed and efficiency. As a core contributor, you will collaborate with groups passionate about compiler, kernel, hardware, and framework development. You will analyze performance bottlenecks, develop new optimization passes, and validate gains through profiling and projection tools. Your work will directly influence the runtime behavior and hardware utilization of next-generation LLMs deployed across NVIDIA's data center and embedded platforms.

Requirements

  • Master's or PhD in Computer Science, Computer Engineering, or a related field, or equivalent experience.
  • Strong hands-on programming expertise in C++ and Python, with solid software engineering fundamentals.
  • Foundational understanding of modern deep learning models (including transformers and LLMs) and interest in inference performance and optimization.
  • Exposure to compiler concepts such as intermediate representations (IR), graph transformations, scheduling, or code generation through coursework, research, internships, or projects.
  • Familiarity with at least one deep learning framework or compiler/runtime ecosystem (e.g., TensorRT-LLM, PyTorch, JAX/XLA, Triton, vLLM, or similar).
  • Ability to analyze performance bottlenecks and reason about optimization opportunities across model execution, kernels, and runtime systems.
  • Experience working on class projects, internships, research, or open-source contributions involving performance-critical systems, compilers, or ML infrastructure.
  • Strong communication skills and the ability to collaborate effectively in a fast-paced, team-oriented environment.

Nice To Haves

  • Proficiency in CUDA programming and familiarity with GPU-accelerated deep learning frameworks and performance tuning techniques.
  • Showcase innovative applications of agentic AI tools that enhance productivity and workflow automation.
  • Active engagement with the open-source LLVM or MLIR community to ensure tighter integration and alignment with upstream efforts.

Responsibilities

  • Analyze the performance of LLMs running on NVIDIA Compute Platforms using profiling, benchmarking, and performance analysis tools.
  • Understand and find opportunities for compiler optimization pipelines, including IR-based compiler middle-end optimizations and kernel-level transformation s
  • Design and develop new compiler passes and optimizations techniques to deliver best-in-class, robust, and maintainable compiler infrastructure and tools.
  • Collaborate with hardware architecture, compiler, and kernel teams to understand how firmware and circuitry co-design enables efficient LLM inference.
  • Work with globally distributed teams across compiler, kernel, hardware, and framework domains to investigate performance issues and contribute to solutions.

Benefits

  • You will also be eligible for equity and benefits.

Stand Out From the Crowd

Upload your resume and get instant feedback on how well it matches this job.

Upload and Match Resume

What This Job Offers

Job Type

Full-time

Career Level

Entry Level

Number of Employees

5,001-10,000 employees

© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service