Principal Engineer MLOps (DLP Detection)

Palo Alto NetworksSanta Clara, CA
85d$165,000 - $210,000Onsite

About The Position

We are looking for a Principal MLOps Engineer to lead the design, development, and operation of production-grade machine learning infrastructure at scale. In this role, you will architect robust pipelines, deploy and monitor ML models, and ensure reliability, reproducibility, and governance across our AI/ML ecosystem. You will work at the intersection of ML, DevOps, and cloud systems, enabling our teams to accelerate experimentation while ensuring secure, efficient, and compliant deployments. This role is located at our dynamic Santa Clara California headquarters campus, and in office 3 days a week. Not a remote role.

Requirements

  • MS / PhD in Computer Science, Engineering, or related field, or equivalent military/industry experience.
  • 8+ years of software/DevOps/ML engineering experience, with at least 3+ years focused on MLOps or ML platform engineering.
  • Strong programming skills (Python, Go, or Java) with deep expertise in building production systems.
  • Experience with cloud platforms (AWS, GCP, Azure) and container orchestration (Kubernetes, Docker).
  • Proven experience in ML infrastructure: model serving (TensorFlow Serving, TorchServe, Triton), workflow orchestration (Airflow, Kubeflow, MLflow, Ray, Vertex AI, SageMaker).
  • Hands-on experience with CI/CD pipelines, infrastructure-as-code (Terraform, Helm), and monitoring/observability tools (Prometheus, Grafana, ELK/EFK stack).
  • Strong knowledge of data pipelines, feature stores, and streaming systems (Kafka, Spark, Flink).
  • Understanding of model monitoring, drift detection, retraining pipelines, and governance frameworks.
  • Ability to influence cross-functional stakeholders, define best practices, and mentor engineers at all levels.
  • Passion for operational excellence, scalability, and securing ML systems in mission-critical environments.

Responsibilities

  • Lead MLOps architecture: Design and implement scalable ML platforms, CI/CD pipelines, and deployment workflows across cloud and hybrid environments.
  • Operationalize ML models: Build automated systems for training, testing, deployment, monitoring, and rollback of ML models in production.
  • Ensure reliability and governance: Implement model versioning, reproducibility, auditing, and compliance best practices.
  • Drive observability & monitoring: Develop real-time monitoring, alerting, and logging solutions for ML services, ensuring performance, drift detection, and system health.
  • Champion automation & efficiency: Reduce friction between data science, engineering, and operations by introducing best practices for infrastructure-as-code, container orchestration, and continuous delivery.
  • Collaborate cross-functionally: Partner with ML engineers, data scientists, security teams, and product engineering to deliver robust, production-ready AI systems.
  • Lead innovation in MLOps: Evaluate and introduce new tools, frameworks, and practices that elevate the scalability, reliability, and security of ML operations.
  • Optimize ML infrastructure for cost efficiency and reduced bootstrapping times across various environments.

Benefits

  • Compensation offered for this position will depend on qualifications, experience, and work location.
  • Starting base salary expected to be between $165000 - $210000/YR.
  • Offered compensation may also include restricted stock units and a bonus.

Stand Out From the Crowd

Upload your resume and get instant feedback on how well it matches this job.

Upload and Match Resume

What This Job Offers

Job Type

Full-time

Career Level

Senior

Industry

Computer and Electronic Product Manufacturing

Education Level

Master's degree

© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service