Senior Machine Learning Engineer, Firefly Foundry

Adobe•San Jose, CA

About The Position

Firefly Foundry is Adobe’s enterprise managed-service offering for custom multimedia generative AI. This role involves building the pipelines and services that turn Firefly Foundry’s models into reliable, enterprise-grade products. You will compose heterogeneous model pipelines including finetuned LLMs, image and video generation models, 3D mesh reconstruction, upsamplers, NSFW and safety checkers, and IP guardrail models. You will deploy them as services, scale them to enterprise traffic, and ensure they meet SLAs for availability and latency, while maintaining served quality. This is a high-ownership role in a fast-moving environment with direct, measurable impact on the availability, latency, cost, and quality of Firefly Foundry’s offerings. Depending on focus, you may own externalizable data pipelines for self-serve fine-tuning, optimized VLM deployments for media intelligence, or the platform for rapid pipeline deployment with full observability.

Requirements

5+ years in machine learning engineering, with significant ownership of production ML or inference services at scale.
Strong Python and deep-learning engineering skills (PyTorch), with hands-on experience deploying and scaling model-backed services.
Experience composing multi-model pipelines and serving them behind APIs — orchestration, batching, autoscaling, and version management.
A track record owning production SLAs — availability, latency, and throughput — backed by real observability, monitoring, and alerting.
Comfort working across multiple, distinct generative model architectures (LLMs and VLMs, diffusion and transformer models, 3D/mesh) — enough to integrate, optimize, and reason about output quality, in partnership with Applied Science.
Experience with multi-tenant systems and data isolation in an enterprise or regulated context.
Fluency with containers and orchestration (Docker, Kubernetes), CI/CD for ML, and a major cloud (AWS or Azure).
GPU inference optimization for latency and cost — quantization, batching, and serving runtimes; custom CUDA a plus.
Strong, data-driven problem-solving and excellent communication in cross-functional teams.
Master’s or PhD in Computer Science, Computer Engineering, or a related field — or equivalent practical experience building and operating production ML systems.

Nice To Haves

custom CUDA a plus

Responsibilities

Compose and build production pipelines from a heterogeneous set of models — prompt-rewrite LLMs, image and video generation, 3D mesh reconstruction, upsampling, NSFW and safety checkers, and IP guardrails.
Deploy these pipelines as services, scale them to enterprise traffic, and hold them to SLAs for availability, latency, and throughput.
Ensure served quality matches the training and reference environment — closing train/serve gaps across precision, preprocessing, and model versions.
Engineer for enterprise from the ground up: tenancy boundaries, data isolation, and the controls that let us honor customer IP contracts under audit.
Build the platform underneath it all — rapid pipeline deployment, observability, monitoring, and alerting.
Build externalizable data pipelines that power self-serve fine-tuning flows for enterprise customers.
Stand up optimized VLM deployments for media intelligence and content querying.