At Modular, we’re on a mission to revolutionize AI infrastructure by systematically rebuilding the AI software stack from the ground up. Our team, made up of industry leaders and experts, is building cutting-edge, modular infrastructure that simplifies AI development and deployment. By rethinking the complexities of AI systems, we’re empowering everyone to unlock AI’s full potential and tackle some of the world’s most pressing challenges. If you’re passionate about shaping the future of AI and creating tools that make a real difference in people’s lives, we want you on our team. You can read about our culture and careers to understand how we work and what we value. About the role: In the Cloud Inference team, we are focused on building end to end distributed LLM inference deployments that are fully vertically integrated with the MAX stack. Our goal is to make inference both the fastest and most scalable and making those systems repeatable to new model architectures. We're seeking engineers who are passionate about pushing the boundaries of distributed inference systems and enjoy working at the intersection of large-scale systems and machine learning. We are looking for candidates based on their breadth and depth of experience in backend engineering, AI inference, and distributed systems development. If this sounds exciting, we invite you to join our world-leading AI infrastructure team and help drive our industry forward! LOCATION: Candidates based in the US or Canada are welcome to apply. You can work out of our office in Los Altos, CA or remotely from home. To support growth and collaboration, those in earlier career stages work in a hybrid capacity at our Los Altos, CA office (minimum 2 days per week on-site) with relocation assistance provided for out-of-state candidates. Senior members have both in office or remote flexibility. Onboarding for new hires is conducted in-person in our Los Altos, CA office.