Senior ML Framework Performance Engineer - AI for Science at Scale

NVIDIA•Santa Clara, CA

About The Position

NVIDIA has become the platform upon which every new AI-powered application is built. We are seeking a Senior Machine Learning Performance Engineer to join our team of scientists and engineers passionate about building the next generation of scientific machine learning (ML) frameworks. Starting with digital biology, we will enable powerful and efficient ML methods through collaborations with industry and academic partners. Together, we will advance NVIDIA’s capacity to accelerate AI for Science and industries that depend on it. What you'll be doing: Design performance and accuracy evaluation frameworks and carry out evaluations of pioneering ML models used in scientific discovery, in particular the ones relating to atomistic modeling. Identify end-to-end model execution bottlenecks, design and implement solutions at scale such as model parallelism. Drive the testing and maintenance of the algorithms and software stack used in the AI for Science applications within the company and in the open source community Stay up-to-date on the latest machine learning technologies and evaluate their potential as solutions to accuracy and/or computational performance bottlenecks. Collaborate with multiple high performance computing, AI infrastructure, and research teams Contribute to documentation or educational content relating to product

Requirements

Advanced degree in a quantitative field such as Computer Science, Computational Biophysics, Computational Chemistry, Physics, Mathematics, or equivalent experience
5+ years of relevant experience
Consistent track record of performance engineering in large scale AI model training and inference applications, and deep understanding of compute bottlenecks of these models, and of paradigms of parallelism in these applications such as model parallelism.
Expertise in modern machine learning frameworks such as PyTorch, JAX, Warp and distributed learning strategies within them
Up-to-date knowledge of ML research in scientific discovery and in atomistic modeling
Experience with software design, building, packaging and launching software products based on ML research or atomistic simulation tools
Recognized for technical leadership contributions, capable of self-direction, and ability to learn from and teach others
You should display strong communication skills, be organized and self-motivated, and play well with others (be an excellent teammate!)

Nice To Haves

Contributor to major scientific codebase for atomistic modeling or AI for science
Experience with CUDA/Triton programming or familiarity with CUDA/Triton extensions of ML frameworks

Responsibilities

Design performance and accuracy evaluation frameworks and carry out evaluations of pioneering ML models used in scientific discovery, in particular the ones relating to atomistic modeling.
Identify end-to-end model execution bottlenecks, design and implement solutions at scale such as model parallelism.
Drive the testing and maintenance of the algorithms and software stack used in the AI for Science applications within the company and in the open source community
Stay up-to-date on the latest machine learning technologies and evaluate their potential as solutions to accuracy and/or computational performance bottlenecks.
Collaborate with multiple high performance computing, AI infrastructure, and research teams
Contribute to documentation or educational content relating to product

Stand Out From the Crowd

Upload your resume and get instant feedback on how well it matches this job.

Upload and Match Resume