Senior Research Engineer

NVIDIAUs, CA
12d

About The Position

Join NVIDIA and help build the software that will define the future of generative AI. We are looking for a research engineer who is passionate about open-source and excited to create our next-generation post-training software stack. You will work at the intersection of research and engineering, collaborating with the Post-Training and Frameworks teams to invent, implement, and scale the core technologies behind our Nemotron models. What you’ll be doing: Work with applied researchers to design, implement and test next generation of RL and pos-training algorithms Contribute and advance open source by developing NeMo-RL , Megatron Core, and NeMo Framework and yet to be announced software You will be engaged as part of one team during Nemotron models post-training Solve large-scale, end-to-end AI training and inference challenges, spanning the full model lifecycle from initial orchestration, data pre-processing, running of model training and tuning, to model deployment. Work at the intersection of computer-architecture, libraries, frameworks, AI applications and the entire software stack. Performance tuning and optimizations, model training with mixed precision recipes on next-gen NVIDIA GPU architectures. Publish and present your results at academic and industry conferences

Requirements

  • BS, MS or PhD in Computer Science, AI, Applied Math, or related fields or equivalent experience
  • 3+ years of proven experience in machine learning, systems, distributed computing, or large-scale model training.
  • Experience with AI Frameworks such as Pytorch or JAX
  • Experience with at least one inference and deployment environments such as vLLM, SGLang or TRT-LLM
  • Proficient in Python programming, software design, debugging, performance analysis, test design and documentation.
  • Strong understanding of AI/Deep-Learning fundamentals and their practical applications.

Nice To Haves

  • Contributions to open source deep learning libraries
  • Hands-on experience in large-scale AI training, with a deep understanding of core compute system concepts (such as latency/throughput bottlenecks, pipelining, and multiprocessing) and demonstrated excellence in related performance analysis and tuning.
  • Expertise in distributed computing, model parallelism, and mixed precision training
  • Prior experience with Generative AI techniques applied to LLM and Multi-Modal learning (Text, Image, and Video).
  • Knowledge of GPU/CPU architecture and related numerical software.

Responsibilities

  • Work with applied researchers to design, implement and test next generation of RL and pos-training algorithms
  • Contribute and advance open source by developing NeMo-RL , Megatron Core, and NeMo Framework and yet to be announced software
  • You will be engaged as part of one team during Nemotron models post-training
  • Solve large-scale, end-to-end AI training and inference challenges, spanning the full model lifecycle from initial orchestration, data pre-processing, running of model training and tuning, to model deployment.
  • Work at the intersection of computer-architecture, libraries, frameworks, AI applications and the entire software stack.
  • Performance tuning and optimizations, model training with mixed precision recipes on next-gen NVIDIA GPU architectures.
  • Publish and present your results at academic and industry conferences

Benefits

  • Your base salary will be determined based on your location, experience, and the pay of employees in similar positions.
  • The base salary range is 160,000 USD - 258,750 USD for Level 3, and 184,000 USD - 299,000 USD for Level 4.
  • You will also be eligible for equity and benefits .
  • NVIDIA is committed to fostering a diverse work environment and proud to be an equal opportunity employer.

Stand Out From the Crowd

Upload your resume and get instant feedback on how well it matches this job.

Upload and Match Resume

What This Job Offers

Job Type

Full-time

Career Level

Mid Level

Education Level

Ph.D. or professional degree

Number of Employees

5,001-10,000 employees

© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service