Sr Machine Learning Engineer - Infra

Adobe•San Jose, CA

14d•$172,500 - $306,625

About The Position

Changing the world through digital experiences is what Adobe’s all about. We give everyone—from emerging artists to global brands—everything they need to design and deliver exceptional digital experiences! We’re passionate about empowering people to create beautiful and powerful images, videos, and apps, and transform how companies interact with customers across every screen. We’re on a mission to hire the very best and are committed to creating exceptional employee experiences where everyone is respected and has access to equal opportunity. We realize that new ideas can come from everywhere in the organization, and we know the next big idea could be yours! The Opportunity Our focus is developing AI technologies for text, images, and videos to boost creativity. We're seeking an outstanding ML infra engineer with deep expertise in building large scale foundation models infrastructures that support all the generative AI efforts in Firefly! This is a chance to create a huge impact in a fast-paced, startup-like environment in a great company. Join us! The position involves building infrastructures touching various components of our foundation model stack. including large scale data processing, scalable and reliable PyTorch training infrastructures, GPU optimizations with custom CUDA kernels on the latest Nvidia GPUs, and more!

Requirements

Graduate, PhD, or postgraduate degree in Computer Science, Computer Engineering, or a related field—or equivalent experience.
5+ years ML Engineering experience, specializing in generative AI like LLMs.
Strong Python and deep learning engineering skills, paired with experience in training and inferencing with PyTorch or TensorFlow, will be essential.
Familiarity with distillation, transformers, and diffusion models.
Knowledge of deployment technologies such as Docker, ML Ops, and ML services is valuable, and experience with cloud platforms like Azure and AWS is a plus.
We value your excellent problem-solving abilities and your capacity to analyze complex issues and drive solutions with a data-driven approach.
Your strong verbal and written communication skills and success in cross-functional team environments will help us all succeed together.

Nice To Haves

Experience with generative image and video is a plus.

Responsibilities

You'll build and optimize infrastructures that power large foundation model training on thousands of GPUs
You will profile GPU utilization, trace inference and training runs and help craft strategies for optimizing our ML model latency.
We'll work together to architect and optimize end-to-end ML pipelines, ensuring they're scalable, efficient, and robust.
You'll dive deep into data to recommend the right models, evaluation metrics, and governance approaches.
Throughout the product lifecycle, you'll engage in architecture, design, deployment, and optimizations of ML models and systems.

Stand Out From the Crowd

Upload your resume and get instant feedback on how well it matches this job.

Upload and Match Resume