Member of Technical Staff - ML Training Systems

Modal•San Francisco, CA

51d•Onsite

About The Position

Modal provides the infrastructure foundation for AI teams. With instant GPU access, sub-second container startups, and native storage, Modal makes it simple to train models, run batch jobs, and serve low-latency inference. Companies like Suno, Lovable, and Substack rely on Modal to move from prototype to production without the burden of managing infrastructure. We're a fast-growing team based out of NYC, SF, and Stockholm. We've hit 9-figure ARR and recently raised a Series B at a $1.1B valuation. We have thousands of customers who rely on us for production AI workloads, including Lovable, Scale AI, Substack, and Suno. Working at Modal means joining one of the fastest-growing AI infrastructure organizations at an early stage, with many opportunities to grow within the company. Our team includes creators of popular open-source projects (e.g. Seaborn, Luigi), academic researchers, international olympiad medalists, and experienced engineering and product leaders with decades of experience. We are looking for strong engineers with experience training production machine learning models. If you are interested in contributing to open-source projects and evolving Modal's infrastructure to train the next generation of language models, we'd love to hear from you!

Requirements

5+ years of experience writing high-quality, high-performance code.
Experience working with torch and high-level training frameworks (Huggingface, verl, slime)
Experience with ML training optimization (tell us a story about eliminating data loading bottlenecks, overlapping communications with compute, rewriting a trainer to handle off-policy rollouts, etc.)
Ability to work in-person, in our NYC or San Francisco office.