At Modular, we optimize inference from kernel to cloud on one unified stack. We are building a differentiated cloud platform that delivers state of the art inference performance from day one, then keeps getting better. As we learn the shape and patterns of each customer's workload, the platform adapts and improves performance automatically over time. The Performance Labs team builds the infrastructure that makes this possible at scale. We continuously apply the latest optimizations across kernels, the inference engine, and distributed systems so that customer workloads stay on the Pareto frontier of cost and performance. We get there through deep workload insights, a scalable platform, and close collaboration with engineering and product teams. In this role you will dig into real customer inference workloads, profile them end to end, and apply the optimizations across kernels, engine, and distributed systems that push each workload toward the Pareto frontier. You will build the tooling and platform that turns one off performance wins into a repeatable, automated optimization loop, and you will work directly with engineering, product, and GTM to bring those gains to customers in production.
Stand Out From the Crowd
Upload your resume and get instant feedback on how well it matches this job.
Job Type
Full-time
Career Level
Senior
Education Level
No Education Listed