The Applied Machine Learning team builds production multimodal systems that understand and transform large-scale image, audio, and video content. Our work spans diffusion-based image generation, transcription and diarization, face and object detection, OCR and image description for search, and automated quality control of media pipelines. We are looking for a Staff Machine Learning Engineer to strengthen our diffusion and video models, adapt small and mid-sized LLMs, and turn our uniquely large corpus of weakly labeled media into a durable product advantage. This is an opportunity to shape the next generation of multimodal experiences end-to-end, from data and models to evaluation and user impact. The Staff Machine Learning Engineer — Multimodal Generation & Post-Training will be a senior individual contributor on a small, applied ML team focused on production multimodal systems. The role will lead fine-tuning and adaptation of diffusion and emerging video models, as well as post-training of small and medium LLMs for captioning, moderation, and retrieval-friendly descriptions. The engineer will design data and evaluation workflows that use our large archive of weakly labeled music, podcast, film, TV, and short-form content to drive measurable quality and efficiency improvements. The role includes close collaboration with partner infra teams for model serving and with adjacent product and research groups to bring new capabilities into production.
Stand Out From the Crowd
Upload your resume and get instant feedback on how well it matches this job.
Job Type
Full-time
Career Level
Mid Level
Number of Employees
5,001-10,000 employees