Senior DevOps Engineer (AI Ops)

AdobeSan Jose, CA

About The Position

We are seeking a hands-on Senior DevOps Engineer specializing in AI Ops to own infrastructure provisioning, CI/CD automation, telemetry pipelines, and production deployment for AI-powered services, agents, and orchestration systems. This role is responsible for building and operating the infrastructure that enables reliable, observable, and scalable AI systems in production. The engineer will help operationalize AI platforms by implementing intelligent monitoring, automated incident response, model lifecycle governance, and data-driven operational insights. The role is SRE-heavy and infrastructure-first, with responsibility for ensuring that systems and services using advanced technology running in production are reliable, resilient, scalable, secured, and cost-effective.

Requirements

  • Hands-on Senior DevOps Engineer specializing in AI Ops
  • Own infrastructure provisioning, CI/CD automation, telemetry pipelines, and production deployment for AI-powered services, agents, and orchestration systems.
  • Build and operate the infrastructure that enables reliable, observable, and scalable AI systems in production.
  • Help operationalize AI platforms by implementing intelligent monitoring, automated incident response, model lifecycle governance, and data-driven operational insights.
  • SRE-heavy and infrastructure-first experience
  • Experience with Infrastructure as Code (Terraform, etc.)
  • Experience provisioning and maintaining Kubernetes clusters and supporting services
  • Experience automating environment setup across dev, stage, and production
  • Experience building and maintaining CI/CD pipelines for AI Services, Agent Frameworks, Orchestrators, and Model Artifacts
  • Experience implementing automated testing and reliability validation gates
  • Experience building safe rollback mechanisms for services and models
  • Experience integrating reliability and health checks into deployment workflows
  • Experience packaging, versioning, and deploying models and agent services in containerized environments
  • Experience managing artifact promotion across environments.
  • Experience monitoring model and agent performance (latency, throughput, accuracy, cost)
  • Experience enabling safe rollout, rollback, and refresh workflows.
  • Experience designing and operating scalable pipelines for collecting and processing logs, metrics, traces, and operational events.
  • Experience enabling structured telemetry for AI services and orchestration systems to support real-time monitoring and operational insights.
  • Experience with AIOps Platform Integration
  • Experience ensuring production reliability and SRE excellence
  • Eager to innovate with AI

Nice To Haves

  • Experience with AI Ops

Responsibilities

  • Design and manage cloud infrastructure using Infrastructure as Code (Terraform, etc.)
  • Provision and maintain Kubernetes clusters and supporting services
  • Automate environment setup across dev, stage, and production
  • Build and maintain CI/CD pipelines for AI Services, Agent Frameworks, Orchestrators, and Model Artifacts
  • Implement automated testing and reliability validation gates
  • Build safe rollback mechanisms for services and models
  • Integrate reliability and health checks into deployment workflows
  • Package, version, and deploy models and agent services in containerized environments while managing artifact promotion across environments.
  • Monitor model and agent performance (latency, throughput, accuracy, cost) and enable safe rollout, rollback, and refresh workflows.
  • Design and operate scalable pipelines for collecting and processing logs, metrics, traces, and operational events.
  • Enable structured telemetry for AI services and orchestration systems to support real-time monitoring and operational insights.
  • Integrate AIOps Platform
  • Ensure systems and services using advanced technology running in production are reliable, resilient, scalable, secured, and cost-effective.

Benefits

  • Ongoing feedback through Check-In approach
  • Meaningful benefits
© 2026 Teal Labs, Inc
Privacy PolicyTerms of Service