The Inference ML Engineering team at Cerebras builds the runtime, APIs, and systems that power the fastest generative AI inference platform in the world. As an Engineering Manager, Inference ML Runtime, you will lead a team responsible for designing and scaling the systems that enable seamless execution of state-of-the-art AI models on Cerebras hardware. You will operate at the intersection of machine learning, distributed systems, and high-performance runtime engineering, translating cutting-edge research into production-ready infrastructure to serve a variety of text-only and multimodal models. This role combines technical leadership, people management, and execution ownership, with direct impact on Cerebras’ core inference platform.
Stand Out From the Crowd
Upload your resume and get instant feedback on how well it matches this job.
Job Type
Full-time
Career Level
Manager
Education Level
No Education Listed
Number of Employees
251-500 employees