Modular-posted 7 days ago
Full-time • Mid Level
Hybrid • Los Altos, CA
51-100 employees

At Modular, we’re on a mission to revolutionize AI infrastructure by systematically rebuilding the AI software stack from the ground up. Our team, made up of industry leaders and experts, is building cutting-edge, modular infrastructure that simplifies AI development and deployment. By rethinking the complexities of AI systems, we’re empowering everyone to unlock AI’s full potential and tackle some of the world’s most pressing challenges. If you’re passionate about shaping the future of AI and creating tools that make a real difference in people’s lives, we want you on our team. You can read about our culture and careers to understand how we work and what we value. About the role: In the Cloud Inference team, we are focused on building end to end distributed LLM inference deployments that are fully vertically integrated with the MAX stack. Our goal is to make inference both the fastest and most scalable and making those systems repeatable to new model architectures. We're seeking engineers who are passionate about pushing the boundaries of distributed inference systems and enjoy working at the intersection of large-scale systems and machine learning. We are looking for candidates based on their breadth and depth of experience in backend engineering, AI inference, and distributed systems development. If this sounds exciting, we invite you to join our world-leading AI infrastructure team and help drive our industry forward! LOCATION: Candidates based in the US or Canada are welcome to apply. You can work out of our office in Los Altos, CA or remotely from home. To support growth and collaboration, those in earlier career stages work in a hybrid capacity at our Los Altos, CA office (minimum 2 days per week on-site) with relocation assistance provided for out-of-state candidates. Senior members have both in office or remote flexibility. Onboarding for new hires is conducted in-person in our Los Altos, CA office.

  • Build & ship Modular’s LLM focused inference services using best-in-class inference techniques (eg disaggregated inference, multi-node deployment of large models, high performance networking, high throughput batch processing, etc)
  • Build the distributed systems needed to support high performance inference (eg distributed kv-cache, expert parallel request routing & rebalancing, etc)
  • Push the envelope for operational excellence with request-to-kernel observability, multi-cloud deployments, clever autoscaling, cold-start optimizations, and more.
  • Collaborate with our kernels and genAI teams to achieve SOTA application performance by integrating SOTA kernel & serving optimizations with SOTA cluster optimizations.
  • Build helm charts, kubernetes operators, and more to make a create simple, effective, maintainable deployments.
  • 5+ years of experience working in backend engineering
  • Experience working on high scale ML inference infrastructure (traditional AI or genAI)
  • Experience with kubernetes and operating your own services
  • Ability to create durable, reusable software tools and libraries that are leveraged across teams and functions
  • Creativity and curiosity for solving complex problems, a team-oriented attitude that enables you to work well with others, and alignment with our culture
  • Strongly identifies with our core company cultural values.
  • Experience with high performance computing / networking (RDMA, RoCE, Infiniband, etc)
  • Experience with LLM Frameworks vLLM, SGLang, TensorRT-LLM
  • Familiarity with golang
  • Amazing Team.
  • World-class Benefits. Premier insurance plans, up to 5% 401k matching, flexible paid time off, and more are available to you!
  • Competitive Compensation. We offer very strong compensation packages, including stock options.
  • Team Building Events. We organize regular team onsites and local meetups in Los Altos, CA as well as different cities.
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service