Inferact's mission is to grow vLLM as the world's AI inference engine and accelerate AI progress by making inference cheaper and faster. Founded by the creators and core maintainers of vLLM, we sit at the intersection of models and hardware—a position that took years to build. About the Role We're looking for an cloud orchestration engineer to build the operational backbone that keeps vLLM running reliably at massive scale. You'll design the systems for cluster management, deployment automation, and production monitoring that enable teams worldwide to serve AI models without friction. You'll ensure that vLLM deployments are observable, debuggable, and recoverable, turning operational complexity into infrastructure that just works.
Stand Out From the Crowd
Upload your resume and get instant feedback on how well it matches this job.
Job Type
Full-time
Career Level
Senior
Education Level
Associate degree