Senior ML research Engineer

Adobe•San Jose, CA

4d•$172,500 - $306,625

About The Position

The Opportunity The Adobe Applied Research Team is looking for a Machine Learning Systems Engineer. You will take the ownership of implementing the low-level optimizations that make our foundation models faster, leaner, and more scalable. This role is designed for a high-potential engineer who is passionate about the intersection of hardware and AI. You will work directly on the infrastructure that powers Adobe’s next generation of Multimodal and Video models, focusing on implementing the next-gen efficient model architecture and squeezing every TFLOP of performance out of our GPU clusters. About Adobe Adobe empowers everyone to create through innovative platforms and tools that unleash creativity, productivity and personalized customer experiences. Adobe’s industry-leading offerings including Adobe Acrobat Studio, Adobe Express, Adobe Firefly, Creative Cloud, Adobe Experience Platform, Adobe Experience Manager, and GenStudio enable people and businesses to turn ideas into impact, powered by AI and driven by human ingenuity. Our 30,000+ employees worldwide are creating the future and raising the bar as we drive the next decade of growth. We’re on a mission to hire the very best and believe in creating a company culture where all employees are empowered to make an impact. At Adobe, we believe that great ideas can come from anywhere in the organization. The next big idea could be yours. Let’s Adobe together At Adobe, we believe in creating a company culture where all employees are empowered to make an impact. Learn more about Adobe life, including our values and culture, focus on people, purpose and community, Adobe for All, comprehensive benefits programs, the stories we tell, the customers we serve, and how you can help us advance our mission of empowering everyone to create. Adobe is proud to be an Equal Employment Opportunity employer. We do not discriminate based on gender, race or color, ethnicity or national origin, age, disability, religion, sexual orientation, gender identity or expression, veteran status, or any other protected characteristic. Learn more. Adobe aims to make our Careers website and recruiting process accessible to any and all users. If you have a disability or special need that requires accommodation to navigate our website or complete the application process, email [email protected] or call +1 408-536-3015. AI Use Guidelines for Interviews: Our interviews are designed to reflect your own skills and thinking. The use of AI or recording tools during live interviews is not permitted unless explicitly invited by the interviewer or approved in advance as part of a reasonable accommodation. If these tools are used inappropriately or in a way that misrepresents your work, your application may not move forward in the process. At Adobe, we empower employees to innovate with AI — and we look for candidates eager to do the same. As part of the hiring experience, we provide clear guidance on where AI is encouraged during the process and where it’s restricted during live interviews. See how we think about AI in the hiring experience. At Adobe, you will be immersed in an exceptional work environment that is recognized around the world. You will also be surrounded by colleagues who are committed to helping each other grow through our unique Check-In approach where ongoing feedback flows freely. If you’re looking to make an impact, Adobe's the place for you. Discover what our employees are saying about their career experiences on the Adobe Life blog and explore the meaningful benefits we offer. There's more than meets the eye when it comes to Adobe. Take the quiz and see how well you know us! Adobe is proud to be an Equal Employment Opportunity employer. We do not discriminate based on gender, race or color, ethnicity or national origin, age, disability, religion, sexual orientation, gender identity or expression, veteran status, or any other applicable characteristics protected by law. Learn more. Adobe aims to make Adobe.com accessible to any and all users. If you have a disability or special need that requires accommodation to navigate our website or complete the application process, email [email protected] or call (408) 536-3015.

Requirements

Education: Master’s or PhD's degree in Computer Science, Electrical Engineering, or a related field with a focus on Parallel Computing or Systems.
Foundational Systems Programming: Proficiency in Python and C++, with a solid understanding of memory management and concurrency.
Hands-on ML Frameworks: Experience working with PyTorch or JAX, specifically moving beyond simple model training into distributed setups (e.g., PyTorch FSDP, DeepSpeed).
GPU Awareness: A strong grasp of GPU architecture (SRAM vs. HBM, warps, and thread blocks) and how these impact the performance of ML workloads.
Analytical Mindset: A "measure twice, cut once" approach to engineering—you enjoy looking at execution traces and flame graphs to find 10% wins.

Nice To Haves

Experience contributing to open-source efficiency libraries (e.g., vLLM, FlashAttention, or TensorRT-LLM).
Exposure to low-level communication libraries like NCCL and an understanding of collective operations (AllReduce, AllGather).
Familiarity with containerization (Docker/Kubernetes) and job scheduling in a headless Linux environment.
Knowledge of modern model architectures like Mixture-of-Experts (MoE) or Diffusion Transformers (DiT

Responsibilities

Kernel Development & Optimization: Write and maintain high-performance GPU kernels using Triton or CUDA to accelerate custom model layers and operations.
Training Efficiency: Support the scaling of large-scale training runs. You will help implement and debug distributed training strategies including ZeRO, Tensor Parallelism, and Pipeline Parallelism.
Inference Acceleration: Implement state-of-the-art inference techniques such as quantization (FP8/INT8), speculative decoding, and KV cache optimizations to reduce latency and cost-per-token.
Performance Profiling: Conduct deep-dive bottleneck analysis using tools like PyTorch Profiler, NVIDIA Nsight, and Nsight Systems to identify stalls in compute, memory bandwidth, or NCCL communication.
System Maintenance: Collaborate on the "Model Factory" pipeline to ensure training jobs are fault-tolerant and utilize cluster resources efficiently across InfiniBand/RoCE networks.

Stand Out From the Crowd

Upload your resume and get instant feedback on how well it matches this job.

Upload and Match Resume