Principal MLOps Engineer, AI Inference

Red Hat•Raleigh, NC

333d•$170,770 - $281,770•Remote

About The Position

At Red Hat we believe the future of AI is open and we are on a mission to bring the power of open-source LLMs and vLLM to every enterprise. Red Hat Inference team accelerates AI for the enterprise and brings operational simplicity to GenAI deployments. As leading developers, maintainers of the vLLM project, and inventors of state-of-the-art techniques for model compression, our team provides a stable platform for enterprises to build, optimize, and scale LLM deployments. We are seeking an experienced ML Ops engineer to work closely with our product and research teams to scale SOTA deep learning products and software. As an ML Ops engineer, you will work closely with our technical and research teams to manage training and deployment pipelines, create DevOps and CI/CD infrastructure, and scale our current technology stack. If you are someone who wants to contribute to solving challenging technical problems at the forefront of deep learning, this is the role for you! In this role, your primary responsibility will be to build and release the Red Hat AI Inference runtimes, continuously improve the processes and tooling used by the DevOps team, and find opportunities to automate procedures and tasks.

Requirements

5+ years of experience in MLOps, DevOps, Automation and modern Software Deployment practices
Strong experience with Git, Github Actions including self-hosted runners, Terraform, Jenkins, and common technologies for automation and monitoring
Experience with Kubernetes/Openshift
Experience with Agile methodology
Experience with Cloud Computing using at least one of the following Cloud infrastructures: AWS, GCP, Azure, or IBM Cloud
Strong programming skills with proven experience implementing Python-based machine learning solutions
Solid troubleshooting skills
Ability to interact comfortably with the other members of a large, geographically dispersed team
Experience maintaining an infrastructure and ensuring stability while adding new features
Familiarity with contributing to the vLLM community is a plus

Nice To Haves

Bachelor's degree or higher in computer science, mathematics, or a related discipline is valued, but we prioritize technical prowess, initiative, problem solving, and practical experience

Responsibilities

Collaborate with research and product development teams to scale machine learning products for internal and external applications
Create and manage training and deployment pipelines
Test to ensure correctness, responsiveness, and efficiency
Troubleshoot, debug and upgrade Dev & Test pipelines
Identifying and deploying cybersecurity measures by continuously performing vulnerability assessment and risk management
Collaborate with a cross-functional team about market requirements and best practices
Keep abreast of the latest technologies and standards in the field

Benefits

Comprehensive medical, dental, and vision coverage
Flexible Spending Account - healthcare and dependent care
Health Savings Account - high deductible medical plan
Retirement 401(k) with employer match
Paid time off and holidays
Paid parental leave plans for all new parents
Leave benefits including disability, paid family medical leave, and paid military leave
Additional benefits including employee stock purchase plan, family planning reimbursement, tuition reimbursement, transportation expense account, employee assistance program, and more!

Stand Out From the Crowd

Upload your resume and get instant feedback on how well it matches this job.

Upload and Match Resume

What This Job Offers

Job Type

Full-time

Career Level

Senior

Industry

Professional, Scientific, and Technical Services

Education Level

Bachelor's degree

Principal MLOps Engineer, AI Inference

About The Position

Requirements

Nice To Haves

Responsibilities

Benefits

What This Job Offers

Job Search Resources

Tools

Career Hubs

Guides

Company