Senior Applied ML Scientist – Generative Video

Apple•Cupertino, CA

42d

About The Position

We are looking for a passionate Senior Applied ML Researcher to drive innovation in generative video modeling, with a strong focus on diffusion-based methods and related generative features. In this role, you will research, design, and deploy models at the intersection of machine learning research and real-world product impact. You will collaborate closely with research scientists, engineers, and product teams to find novel applications of generative AI capabilities to assist our creative user base. Your mission is to elevate the workflows of millions of creators by combining generative AI with Apple’s human-centered design principles. DESCRIPTION Design and train state-of-the-art generative video models, primarily based on diffusion, consistency, rectified flow, or related generative frameworks Explore novel architectures for spatiotemporal modeling (e.g., 3D U-Nets, DiT-style Transformers, hybrid CNN-Transformer models) Conduct experiments on long-range temporal coherence, motion consistency, controllability, and multi-modal conditioning (text, audio, images)

Requirements

MS in Computer Science, Machine Learning, or a related field, or equivalent practical experience
4+ years of experience in deep learning for generative models, particularly diffusion-based methods
Hands-on experience in distributed training of large models using PyTorch (or equivalent frameworks)
Solid understanding of video representations, spatiotemporal modeling, and neural network optimization.
Experience designing & training multi-modal DiT-style Transformers, latent diffusion, or multi-stage video generation pipelines
Experience design & training adapters
Proven ability as a creative problem solver

Nice To Haves

PhD with research focused on generative modeling, diffusion models, or video understanding
Experience with text-to-video, image-to-video, or video-to-video generation
Publications in top-tier ML conferences (e.g., NeurIPS, ICML, ICLR, CVPR, ICCV, ECCV, etc.)

Responsibilities

Design and train state-of-the-art generative video models, primarily based on diffusion, consistency, rectified flow, or related generative frameworks
Explore novel architectures for spatiotemporal modeling (e.g., 3D U-Nets, DiT-style Transformers, hybrid CNN-Transformer models)
Conduct experiments on long-range temporal coherence, motion consistency, controllability, and multi-modal conditioning (text, audio, images)

Stand Out From the Crowd

Upload your resume and get instant feedback on how well it matches this job.

Upload and Match Resume