We are now looking for a Senior Software Engineer for Quantized Inference! NVIDIA is seeking software engineers to accelerate the discovery and deployment of efficient inference recipes for LLMs. A recipe defines which operators are transformed into low-precision or sparsified variants — unlocking throughput and latency gains without regressing accuracy or verbosity. Recipes may incorporate techniques such as rotations, block scaling to attenuate outlier impact, or improved calibration data drawn from SFT/RL pipelines. Each new recipe demands corresponding kernel and model-level implementations in inference engines (vLLM, TRT-LLM, SGLang). The candidate will translate recipe specifications into functionally correct, performant code, e.g., writing Triton kernels, inserting quantize/dequantize nodes into prefill and decode paths, and ensuring per-expert scaling in MoE layers is handled correctly. From there, the candidate will collaborate with partner inference teams to further optimize throughput and interactivity on target workloads. This work is a core component of our productization effort across Megatron-LM, ModelOpt, and vLLM.
Stand Out From the Crowd
Upload your resume and get instant feedback on how well it matches this job.
Job Type
Full-time
Career Level
Mid Level