System Software Engineer, LLM Inference and Performance Optimization

NvidiaSanta Clara, CA
473d$180,000 - $339,250

About The Position

As a System Software Engineer specializing in LLM Inference and Performance Optimization, you will play a crucial role in advancing AI technologies. This position focuses on optimizing large language models for real-time performance across various hardware platforms, contributing to innovative solutions that shape the future of technology.

Requirements

  • 8+ years of expert proficiency in C++ with a deep understanding of memory management, concurrency, and low-level optimizations.
  • M.S. or higher degree (or equivalent experience) in Computer Science/Engineering or a related field.
  • Strong experience in system-level software engineering, including multi-threading, data parallelism, and performance tuning.
  • Validated expertise in LLM inference, with experience in model serving frameworks like ONNX Runtime and TensorRT.
  • Familiarity with real-time systems and performance-tuning techniques, especially for machine learning inference pipelines.
  • Ability to work collaboratively with Machine Learning Engineers and cross-functional teams to align system-level optimizations with model goals.
  • Extensive understanding of hardware architectures and the ability to leverage specialized hardware for optimized ML model inference.

Nice To Haves

  • Experience with deep learning hardware accelerators, such as Nvidia GPUs.
  • Familiarity with ONNX, TensorRT, or cuDNN for LLM inference on GPU.
  • Experience with low-latency optimizations and real-time system constraints for ML inference.

Responsibilities

  • Design, implement, and optimize inference logic for fine-tuned LLMs, collaborating closely with Machine Learning Engineers.
  • Develop efficient, low-latency glue logic and inference pipelines that are scalable across various hardware platforms, ensuring outstanding performance and minimal resource usage.
  • Apply hardware accelerators such as GPUs and other specialized hardware to enhance inference speed and implement real-world applications effectively.
  • Collaborate with cross-functional teams to integrate models seamlessly into diverse environments, adhering to strict functional and performance requirements.
  • Conduct detailed performance analysis and optimization for specific hardware platforms, focusing on efficiency, latency, and power consumption.

Benefits

  • Equity options
  • Comprehensive health insurance
  • Retirement savings plan
  • Paid time off
  • Flexible work hours
  • Professional development opportunities

Stand Out From the Crowd

Upload your resume and get instant feedback on how well it matches this job.

Upload and Match Resume

What This Job Offers

Job Type

Full-time

Career Level

Senior

Industry

Computer and Electronic Product Manufacturing

Education Level

Master's degree

© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service