AI Operations Engineering Technical Leader

CiscoMilpitas, CA
Remote

About The Position

Cisco is looking for a highly experienced and innovative ML Operations Engineer to join our global DevOps team. In this critical role, you will drive the production readiness, deployment, and maintenance of scalable machine learning systems. You will work closely with a cross-functional team of data scientists, software development engineers, information security professionals, and DevOps engineers on creating secure and resilient ML pipelines.

Requirements

  • Bachelor's degree in Comp Science, Engineering (or related field /industry) + 8 years of DevOps experience, Masters + 6 years of related experience, or PhD + 3 years of related experience.
  • Understanding of CI/CD pipelines and automation tools.
  • Knowledge of cloud platforms, minimally AWS with Azure and GCP as a bonus
  • Proficiency in Python and familiarity with ML libraries (e.g., Scikit-learn, PyTorch, TensorFlow, etc.)
  • Strong understanding of ML lifecycle management and model versioning

Nice To Haves

  • Experience deploying large language models (LLMs) or generative AI systems
  • Familiarity with feature stores, vector databases, or data observability platforms
  • Excellent communication, collaboration, and mentoring skills.
  • Deep expertise in CI/CD tooling and practices, including hands-on experience with systems like Jenkins, GitLab, ArgoCD, or similar.
  • Strong proficiency in Kubernetes, Docker, and cloud-native patterns in AWS, Azure, or GCP.

Responsibilities

  • Design, build, and manage robust ML pipelines for training, validation, and deployment
  • Build and maintain scalable infrastructure using Kubeflow for ML experiments and inference in multiple public clouds
  • Implement CI/CD in GitHub for ML systems ensuring reproducibility and traceability
  • Experience driving the implementation LLM evaluation and observability solutions
  • Advocate automation in every layer of the infrastructure stack using Infrastructure as Code (IaC) principles and tools such as Terraform, Helm, and GitOps frameworks
  • Monitor models in production for performance degradation, drift, and fairness
  • Participate in on-call rotation for ML Operations
  • Work closely with data scientists, engineers, and product managers to understand requirements and integrate models into applications

Benefits

  • medical, dental and vision insurance
  • a 401(k) plan with a Cisco matching contribution
  • paid parental leave
  • short and long-term disability coverage
  • basic life insurance
  • 10 paid holidays per full calendar year, plus 1 floating holiday for non-exempt employees
  • 1 paid day off for employee’s birthday, paid year-end holiday shutdown, and 4 paid days off for personal wellness determined by Cisco
  • 16 days of paid vacation time per full calendar year, accrued at rate of 4.92 hours per pay period for full-time employees (non-exempt)
  • flexible vacation time off program (exempt)
  • 80 hours of sick time off provided on hire date and each January 1st thereafter, and up to 80 hours of unused sick time carried forward from one calendar year to the next
  • Optional 10 paid days per full calendar year to volunteer
  • annual bonuses (for non-sales roles)
  • performance-based incentive pay (for sales roles)
© 2026 Teal Labs, Inc
Privacy PolicyTerms of Service