Principal Machine Learning Engineer, Accelerated Apache Spark

NVIDIA•Santa Clara, CA

3d•$272,000 - $431,250

About The Position

NVIDIA is looking for a Machine Learning (ML) Engineer to join the GPU accelerated Apache Spark team. Apache Spark is the most popular data processing engine in data centers for running large scale workloads for ETL, SQL, and ML/DL model training and inference pipelines, spanning many domains and use cases. NVIDIA GPUs offer a promising avenue for significantly speeding up and/or lowering the cost of running Apache Spark applications at massive scales. You will work with the open source community to accelerate Apache Spark with GPUs. You will apply the latest ML/AI methods to empower enterprises to migrate Spark workloads onto GPUs at scale.

Requirements

BS, MS, or PhD or equivalent experience in Machine Learning, Data Science, Computer Science or a closely related field.
12+ years of professional experience in designing, implementing, and productionizing high-quality ML/DL solutions.
5+ experience as technical lead in ML model development.
Proven hands-on experience (2+ years) with large-scale data processing platforms, such as Apache Spark.
Proven ability to employ modern tooling and sound techniques for all aspects of crafting, deploying, and maintaining machine learning models.
Excellent programming skills in Python and Python data science related libraries like numpy, pandas, scikit-learn, scipy, pytorch, and tensorflow.
Deep experience with sophisticated ML methodologies, including LLM/GenAI, reinforcement learning, and adaptive, on-line ML systems.
Strong expertise in feature engineering, feature importance assessment, and developing boosted tree model solutions (e.g., XGBoost).

Nice To Haves

Understanding of the internal workings and architecture related to Apache Spark.
Familiarity with NVIDIA GPUs and CUDA.
Experience coding in Scala, Java, and/or C++.

Responsibilities

Design and implement machine learning solutions for performance prediction and optimization of GPU accelerated enterprise Apache Spark workloads.
Develop advanced algorithms and adaptive systems to continuously improve the performance of Apache Spark workloads on GPUs.
Develop AI-based agents and tools to assist with fixing system issues and application optimization.
Collaborate with key partners and customers on the deployment of complex machine learning solutions in various environments.
Maintain deep domain expertise by knowing the latest published advances in ML systems and algorithms.
Provide technical mentorship and leadership in data science and machine learning to a team of engineers.