Kubernetes / OpenShift AI Platform Engineer

TEKsystemsChandler, AZ
$70 - $89Hybrid

About The Position

We are seeking a Kubernetes / OpenShift AI Platform Engineer to design, build, and optimize enterprise-scale infrastructure supporting advanced AI/ML workloads. This role sits at the intersection of platform engineering, DevOps, and AI infrastructure, enabling model development, training, and real-time inference in a highly regulated environment. You will work cross-functionally with AI/ML engineers, data scientists, DevOps, and infrastructure teams to deliver scalable, secure, and high-performance AI platforms.

Requirements

  • 5–7+ years of experience with Kubernetes (production environments)
  • Strong experience with Red Hat OpenShift in enterprise environments
  • 5–7+ years of hands-on experience with Docker and containerization technologies
  • Strong proficiency in Python for automation and platform engineering
  • Solid experience working in Linux environments (systems, networking, storage)
  • Experience with AWS or other cloud platforms
  • Hands-on experience with Terraform and CI/CD tools (e.g., Jenkins)
  • Experience supporting AI/ML platforms, model deployment pipelines, or similar workloads
  • Deep understanding of Kubernetes architecture and cluster lifecycle management
  • Proven ability to operate in large-scale, fast-paced enterprise environments
  • Strong problem-solving and troubleshooting skills across distributed systems
  • Experience building platforms that support other engineering teams

Nice To Haves

  • Experience with AI/ML frameworks such as: PyTorch, TensorFlow, Triton Inference Server, vLLM
  • Experience with agentic AI systems or intelligent agents
  • Familiarity with Kubernetes Operators and Helm
  • Familiarity with GitOps practices and platform standardization
  • Strong understanding of Observability (Prometheus, Grafana)
  • Strong understanding of Kubernetes/OpenShift security models (SCCs, RBAC, etc.)

Responsibilities

  • Design and manage Kubernetes and OpenShift clusters at enterprise scale
  • Build and optimize infrastructure for AI/ML model training and inference workloads
  • Develop automation for deployment, configuration, patching, and platform operations using Python
  • Support GPU-enabled workloads and high-performance compute environments
  • Implement and maintain CI/CD pipelines, GitOps workflows, and infrastructure-as-code (Terraform)
  • Ensure platform reliability, scalability, and performance optimization
  • Implement security best practices including RBAC, network policies, and secrets management
  • Enable observability through Prometheus, Grafana, and logging frameworks
  • Collaborate with engineering teams to standardize and streamline AI platform environments

Benefits

  • Medical, dental & vision
  • Critical Illness, Accident, and Hospital
  • 401(k) Retirement Plan – Pre-tax and Roth post-tax contributions available
  • Life Insurance (Voluntary Life & AD&D for the employee and dependents)
  • Short and long-term disability
  • Health Spending Account (HSA)
  • Transportation benefits
  • Employee Assistance Program
  • Time Off/Leave (PTO, Vacation or Sick Leave)
© 2026 Teal Labs, Inc
Privacy PolicyTerms of Service