ML Platform / MLOps Engineer

Profluent•Emeryville, CA

64d•$180,000 - $250,000

About The Position

Profluent is an AI-first protein design company. Founded in 2022, we develop deep generative models to design and validate novel, functional proteins to revolutionize biomedicine. Based in Emeryville, CA, we are backed by leading investors including Altimeter Capital, Bezos Expeditions, Spark Capital, Insight Partners, Air Street Capital, AIX Ventures, and Convergent Ventures, and have raised over $150M to date. As we continue to push the boundaries of what is possible, we’re seeking an ML Platform / MLOps Engineer on the machine learning team to build and operate the infrastructure that powers our machine learning systems. You will work closely with machine learning scientists and engineers to develop reliable, scalable platforms for training, evaluating, and deploying large-scale generative biology models. As an early member of the company, you’ll have significant ownership over the systems and tools that enable our research team to move quickly from experiments to production models. What You'll Work On Infrastructure supporting large-scale generative models for proteins Systems that process massive biological datasets Experimentation platforms that enable rapid iteration by ML researchers Production services powered by machine learning models

Requirements

BS in Computer Science or a related field
3+ years of experience building or operating production ML systems
Experience with MLOps, ML infrastructure, or ML platform engineering
Strong experience with cloud infrastructure (GCP preferred)
Experience working with containerized workloads and orchestration systems (Kubernetes, Docker)
Experience building data or ML pipelines
Familiarity with CI/CD and infrastructure-as-code practices

Nice To Haves

Experienced with the challenges of working with large scale ML models
Experience with transitioning research ideas into production
Familiarity with ML frameworks such as PyTorch, MLFlow
Interested in the intersection between biology and AI

Responsibilities

Develop infrastructure that enables researchers to run large-scale ML training and inference workloads reliably and efficiently on GPU clusters
Implement and maintain security best practices across our ML infrastructure, including access control, secrets management, and least-privilege policies
Monitor and optimize infrastructure performance, reliability, and cost
Build and maintain machine learning pipelines to support model inference workloads
Implement CI/CD pipelines for machine learning models and services
Develop tooling that helps researchers move quickly from experiments to production models
Maintain infrastructure for model serving and internal APIs