ML Ops Engineer

Raft Company WebsiteTampa, FL
3dOnsite

About The Position

As an ML Ops Engineer, you will collaborate with a cross-functional data team comprising AI/ML Engineers, DevSecOps engineers, Product Owners, Data Engineers, Data Analysts, and Data Scientists. Your primary responsibility will be to design, build, and maintain the infrastructure and pipelines that enable Machine Learning model training, deployment, and scaling. You will manage distributed workloads across GPU-enabled Kubernetes clusters and ensure efficient resource orchestration between training and inference operations.

Requirements

  • 3+ years of relevant hands-on experience
  • Experience building and maintaining machine learning pipelines
  • Strong Python skills for defining and maintaining ML pipelines
  • Practical experience with PyTorch (TensorFlow experience acceptable)
  • Airflow for job orchestration, particularly managing resources between training and inference workloads
  • Strong Kubernetes experience including managing local clusters, running different flavors, and managing custom resource definitions
  • Istio networking experience in Kubernetes environments
  • Experience working with MinIO object storage
  • Must have hands-on experience running GPU workloads on Kubernetes
  • Fast learner, analytical thinker, creative, hands-on, strong communication skills
  • Able to work both independently and as part of a team
  • Excellent problem-solving skills and attention to detail

Nice To Haves

  • CENTCOM or DoD experience
  • Experience with time slicing GPUs on Kubernetes
  • Exposure to computer vision and/or large imagery formats such as NITF
  • Publications or GitHub repos showcasing your skills
  • Experience with Docker and container orchestration best practices

Responsibilities

  • design, build, and maintain the infrastructure and pipelines that enable Machine Learning model training, deployment, and scaling
  • manage distributed workloads across GPU-enabled Kubernetes clusters
  • ensure efficient resource orchestration between training and inference operations

Benefits

  • Highly competitive salary
  • Fully covered healthcare, dental, and vision coverage
  • 401(k) and company match
  • Take as you need PTO + 11 paid holidays
  • Education & training benefits
  • Generous Referral Bonuses
  • And More!

Stand Out From the Crowd

Upload your resume and get instant feedback on how well it matches this job.

Upload and Match Resume

What This Job Offers

Job Type

Full-time

Career Level

Mid Level

Education Level

No Education Listed

Number of Employees

251-500 employees

© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service