Senior DevOps / MLOps Engineer

Zoetis•Fort Collins, CO

48d•$105,000 - $145,000•Remote

About The Position

As the leader in animal health, Zoetis is looking to recruit a Senior DevOps/MLOps Engineer into its world-class Veterinary Medicine Research and Development (VMRD) organization to operationalize AI/ML, scientific modeling, and digital twin workloads. You’ll build secure, scalable platforms and data pipelines across cloud and on‑prem/HPC, partnering closely with biologists and data scientists to translate scientific questions into reliable production systems.

Requirements

PhD in a quantitative field (computer science, ML, computational biology, applied math) or MS/BS with equivalent senior engineer level experience working in a scientific domain.
6+ years building production systems; strong software engineering fundamentals.
Expert in Python
Strong experience with a query language such as SQL, MapReduce, and/or Cypher
Proficiency in one of: C++, Go, Rust, Java, or Scala.
Docker, Kubernetes, CI/CD (e.g., GitHub Actions), secure artifact/container registries.
Data pipeline orchestration (e.g., Databricks, Dagster, Kedro); streaming (Kafka or Redis); data modeling with SQL/NoSQL/graph.
MLOps: experiment tracking and model versioning (e.g., MLflow), model serving and monitoring.
Cloud (AWS/Azure/GCP) and on‑prem/HPC (e.g., Slurm) experience.
Experience on multidisciplinary projects and teams, including scientists and software engineers, with excellent communication with scientific stakeholders.

Nice To Haves

APIs and scientific apps: FastAPI; minimal UIs (Streamlit/React); scientific computing (NumPy, Pandas, SciPy).
DevOps/IaC: Terraform; GitOps (Argo CD/Flux); Helm/Kustomize; Docker/Kubernetes; secure registries and config.
Data engineering: dbt and feature stores; Parquet/Delta; schema/lineage with Avro/Protobuf, OpenLineage, Great Expectations.
Observability/SRE: Prometheus/Grafana; ELK/OpenSearch; OpenTelemetry; SLIs/SLOs and performance profiling/optimization.
Distributed compute and resilience: Dask, Ray, Spark; HPC/Slurm; GPU scheduling; service mesh (Istio/Linkerd), API gateways, ingress; encryption/secrets/KMS, audit trails, backup/restore, DR.

Responsibilities

Build end‑to‑end DevOps/MLOps foundations: CI/CD for code/data/models, containerization/orchestration, artifact/registry management, and secure configuration.
Design and operate data engineering pipelines (batch/streaming) with data quality checks, lineage, schema contracts, and governance across lake/warehouse environments.
Productionize scientific and digital twin workflows into services/APIs and lightweight UIs with reproducibility, versioning, auditability, and compliant deployment.
Implement scalable training/inference (batch/real‑time) with observability, SLIs/SLOs, runbooks, incident response, and automated rollback strategies.
Run distributed/HPC jobs (including GPU) and optimize storage, throughput, and cost across on‑prem and cloud; collaborate with scientists on experiment design, data/compute needs, and validation.