Red Hat-posted 9 months ago
$170,770 - $281,770/Yr
Full-time • Senior
Remote • Raleigh, NC
Professional, Scientific, and Technical Services

At Red Hat we believe the future of AI is open and we are on a mission to bring the power of open-source LLMs and vLLM to every enterprise. Red Hat Inference team accelerates AI for the enterprise and brings operational simplicity to GenAI deployments. As leading developers, maintainers of the vLLM project, and inventors of state-of-the-art techniques for model compression, our team provides a stable platform for enterprises to build, optimize, and scale LLM deployments. We are seeking an experienced ML Ops engineer to work closely with our product and research teams to scale SOTA deep learning products and software. As an ML Ops engineer, you will work closely with our technical and research teams to manage training and deployment pipelines, create DevOps and CI/CD infrastructure, and scale our current technology stack. If you are someone who wants to contribute to solving challenging technical problems at the forefront of deep learning, this is the role for you! In this role, your primary responsibility will be to build and release the Red Hat AI Inference runtimes, continuously improve the processes and tooling used by the DevOps team, and find opportunities to automate procedures and tasks.

  • Collaborate with research and product development teams to scale machine learning products for internal and external applications
  • Create and manage training and deployment pipelines
  • Test to ensure correctness, responsiveness, and efficiency
  • Troubleshoot, debug and upgrade Dev & Test pipelines
  • Identifying and deploying cybersecurity measures by continuously performing vulnerability assessment and risk management
  • Collaborate with a cross-functional team about market requirements and best practices
  • Keep abreast of the latest technologies and standards in the field
  • 5+ years of experience in MLOps, DevOps, Automation and modern Software Deployment practices
  • Strong experience with Git, Github Actions including self-hosted runners, Terraform, Jenkins, and common technologies for automation and monitoring
  • Experience with Kubernetes/Openshift
  • Experience with Agile methodology
  • Experience with Cloud Computing using at least one of the following Cloud infrastructures: AWS, GCP, Azure, or IBM Cloud
  • Strong programming skills with proven experience implementing Python-based machine learning solutions
  • Solid troubleshooting skills
  • Ability to interact comfortably with the other members of a large, geographically dispersed team
  • Experience maintaining an infrastructure and ensuring stability while adding new features
  • Familiarity with contributing to the vLLM community is a plus
  • Bachelor's degree or higher in computer science, mathematics, or a related discipline is valued, but we prioritize technical prowess, initiative, problem solving, and practical experience
  • Comprehensive medical, dental, and vision coverage
  • Flexible Spending Account - healthcare and dependent care
  • Health Savings Account - high deductible medical plan
  • Retirement 401(k) with employer match
  • Paid time off and holidays
  • Paid parental leave plans for all new parents
  • Leave benefits including disability, paid family medical leave, and paid military leave
  • Additional benefits including employee stock purchase plan, family planning reimbursement, tuition reimbursement, transportation expense account, employee assistance program, and more!
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service