Staff Applied Scientist

Adobe•San Jose, CA

About The Position

Adobe Firefly’s ASML group invites research scientists and engineers passionate about conditional generation and editing of large generative AI models. This role emphasizes images and videos. We strive to advance generative AI technology while guaranteeing models possess excellent quality and control. We are especially looking for candidates experienced in large-scale, industry-level pre-training and mid-training of multi-modality generative models. This role has a direct effect on the quality of Adobe’s image and video generation models, supporting next-generation creative workflows for millions of users. As an Applied Scientist at Adobe, you will join a world-class team of applied researchers and engineers building the future of digital experiences. You will have the opportunity to innovate across the full training stack, collaborate across data, modeling, and product, and see your work ship to customers worldwide.

Requirements

Ph.D. in Computer Science, Machine Learning, or a related field, with significant industry experience building and shipping large-scale ML systems.
Deep expertise in modern generative architectures such as diffusion models, with experience owning end-to-end conditional generation or editing pipelines for image, video, or audio.
Proven ability to architect and scale ML systems using frameworks like PyTorch, including leading distributed training infrastructure design.
Extensive experience in VLM finetuning for image, video, and audio understanding, with a track record of aligning research goals with product requirements.
Experience owning large-scale automated captioning pipelines across image, video, and audio datasets.
Strong software engineering skills in Python and PyTorch, with emphasis on production-quality systems.
Excellent communication skills with the ability to influence technical direction across teams and present strategy to senior leadership.

Responsibilities

Define and drive the technical strategy for mid-training approaches that improve editing capabilities across Adobe's multimodal generative models for image, video, and audio.
Own and drive multiple complex workstreams within the mid-training stack (e.g., image-to-image editing, instruction-based editing, cross-modal editing), making key architectural and prioritization decisions.
Set technical direction for large-scale captioning pipelines and lead VLM finetuning strategy to improve multimodal understanding across visual and auditory domains.
Own end-to-end workflows for data curation, quality improvements, and distributed training, driving infrastructure decisions that unblock the broader organization.
Drive alignment across research, data, evaluation, infrastructure, pre-training, and post-training teams, influencing leadership on technical strategy and investment priorities.
Mentor junior and mid-level engineers through design reviews and technical guidance, raising the team's overall capability.