At Modular, we’re on a mission to revolutionize AI infrastructure by systematically rebuilding the AI software stack from the ground up. Our team, made up of industry leaders and experts, is building cutting-edge, modular infrastructure that simplifies AI development and deployment. By rethinking the complexities of AI systems, we’re empowering everyone to unlock AI’s full potential and tackle some of the world’s most pressing challenges. If you’re passionate about shaping the future of AI and creating tools that make a real difference in people’s lives, we want you on our team. You can read about our culture and careers to understand how we work and what we value. About the role: ML developers today face significant friction in taking trained models into deployment. They work in a highly fragmented space, with incomplete and patchwork solutions that require significant performance tuning and non-generalizable/ model-specific enhancements. At Modular, we are building the next generation AI platform (MAX) that will radically improve the way developers build and deploy AI models. We’re continuously working to improve the performance and scalability of MAX by extending existing features and adding new features for users to try. The Serve Optimizations team is responsible for working cross-functionally across the entire Modular tech stack to implement cutting edge optimizations and research for auto-regressive text generation, image generation, and beyond. Think things like Speculative Decoding, LoRA, Quantization, Chunked Prefill, Distributed Inference, etc. LOCATION: Candidates based in the US or Canada are welcome to apply. You can work in our office in Los Altos, CA or remotely from home. Onboarding for new hires is conducted in-person in our Los Altos, CA office.
Stand Out From the Crowd
Upload your resume and get instant feedback on how well it matches this job.
Job Type
Full-time
Career Level
Mid Level
Education Level
No Education Listed
Number of Employees
51-100 employees