Applied Scientist

Adobe•San Jose, CA

21h

About The Position

Adobe Firefly’s ASML group invites research scientists and engineers passionate about conditional generation and editing of large generative AI models. This role emphasizes images and videos. We strive to advance generative AI technology while guaranteeing models possess excellent quality and control. We are especially looking for candidates experienced in large-scale, industry-level pre-training and mid-training of multi-modality generative models. This role has a direct effect on the quality of Adobe’s image and video generation models, supporting next-generation creative workflows for millions of users. As an Applied Scientist at Adobe, you will join a world-class team of applied researchers and engineers building the future of digital experiences. You will have the opportunity to innovate across the full training stack, collaborate across data, modeling, and product, and see your work ship to customers worldwide.

Requirements

Possession of a Master’s or Ph.D. degree in Computer Science, Machine Learning, or a related field.
Solid understanding of modern generative architectures such as diffusion models, with familiarity with conditional generation or editing methods for image, video, or audio tasks.
Experience implementing machine learning models using modern deep learning frameworks (e.g., PyTorch), with exposure to large-scale or distributed training workflows.
Experience or research background in VLM finetuning with a focus on image, video, and audio understanding, including adapting pretrained vision-language models for downstream multimodal tasks.
Familiarity with captioning for large-scale data, including designing or applying automated captioning pipelines for image, video, and audio datasets.
Strong coding and prototyping ability in Python and PyTorch.
Excellent communication skills and ability to collaborate across cross-functional teams.

Responsibilities

Contribute to the design, implementation, and evaluation of mid-training approaches that improve editing capabilities for Adobe's multimodal generative models across image, video, and audio.
Own well-defined components within the mid-training stack — such as image-to-image editing or instruction-based editing — designing and running experiments to test hypotheses and identify quality gaps.
Build and maintain large-scale captioning pipelines and support VLM finetuning efforts to improve multimodal understanding across visual and auditory domains.
Assist in building scalable workflows for data curation, quality improvements, and distributed training, applying research insights from diffusion models and large-scale training to practical model improvement.
Collaborate closely with research, data, evaluation, infrastructure, pre-training, and post-training teams, contributing to knowledge sharing and documentation of experiments, datasets, and training approaches.