At Red Hat, we believe the future of AI is open, and we are on a mission to bring the power of open-source LLMs and vLLM to every enterprise. The Red Hat Inference team accelerates AI for the enterprise and brings operational simplicity to GenAI deployments. As leading contributors and maintainers of the vLLM and LLM-D projects and inventors of state-of-the-art techniques for model quantization and sparsification, our team provides a stable platform for enterprises to build, optimize, and scale LLM deployments. As a Machine Learning Engineer focused on vLLM, you will be at the forefront of innovation, collaborating with our team to tackle the most pressing challenges in model performance and efficiency. In this role, you will build and maintain subsystems that allow vLLM to speak the language of tools . You will bridge the gap between probabilistic token generation and deterministic schema compliance, working directly on tool parsers to interpret raw model outputs and structured output engines to guide generation at the logit level. If you are someone who wants to contribute to solving challenging technical problems at the forefront of deep learning in the open source way, this is the role for you. Join us in shaping the future of AI Inference!
Stand Out From the Crowd
Upload your resume and get instant feedback on how well it matches this job.
Job Type
Full-time
Career Level
Mid Level
Number of Employees
5,001-10,000 employees