Openshift AI Ops Consultant

TEKsystemsPennington, NJ
$80 - $90Hybrid

About The Position

We’re seeking a senior AI Platform SRE / MLOps engineer to support and stabilize a production Generative AI platform, running on Red Hat OpenShift. This is a hands-on, high-impact role focused on operational excellence, reliability engineering, and performance tuning of GPU-accelerated AI workloads in a regulated enterprise environment. You will act as a key technical resource within Dell’s delivery team, helping bring structure, stability, and scalability to an evolving GenAI platform.

Requirements

  • 8+ years in SRE / DevOps / Platform Engineering roles
  • Deep experience with Red Hat OpenShift or Kubernetes in production environments
  • Cluster administration, scaling, upgrades, troubleshooting
  • Hands-on experience supporting AI/ML workloads in production
  • Proven experience with GPU-accelerated environments, including: NVIDIA stack (CUDA, Triton, TensorRT, etc.)
  • Strong SRE mindset: Incident management, monitoring, uptime, reliability engineering
  • Scripting/automation experience (Python, Bash, etc.)
  • Ability to operate independently in ambiguous, high-pressure environments

Nice To Haves

  • Experience in financial services or regulated environments
  • Familiarity with MLOps tooling (Kubeflow, MLflow, ArgoCD)
  • Knowledge of model optimization techniques (quantization, pruning)
  • Certifications: Red Hat (RHCE), CKA / CKS
  • Prior consulting or residency-style engagements

Responsibilities

  • Own day-to-day operations of a production GenAI platform running on OpenShift/Kubernetes
  • Diagnose and resolve performance, stability, and scaling issues across AI workloads
  • Optimize GPU-based inference pipelines using tools like: NVIDIA Triton Inference Server, TensorRT / CUDA
  • Implement SRE best practices: Monitoring & observability (Prometheus, Grafana, etc.), Incident response & root cause analysis, Automation & runbook creation
  • Improve cluster performance, resource utilization, and reliability
  • Collaborate with stakeholders while operating with high autonomy and limited guidance
  • Ensure platform adheres to enterprise governance, security, and compliance standards

Benefits

  • Medical, dental & vision
  • Critical Illness, Accident, and Hospital
  • 401(k) Retirement Plan – Pre-tax and Roth post-tax contributions available
  • Life Insurance (Voluntary Life & AD&D for the employee and dependents)
  • Short and long-term disability
  • Health Spending Account (HSA)
  • Transportation benefits
  • Employee Assistance Program
  • Time Off/Leave (PTO, Vacation or Sick Leave)
© 2026 Teal Labs, Inc
Privacy PolicyTerms of Service