Red Hat-posted 7 months ago
$116,270 - $191,840/Yr
Full-time • Principal
Remote • Raleigh, NC
Professional, Scientific, and Technical Services

Are you ready to take a technical leadership role in shaping the quality of an open-source, Kubernetes-native AI platform that's redefining hybrid cloud? The Red Hat OpenShift AI (RHOAI) team is seeking a Principal Quality Engineer with deep experience in Kubernetes-native application testing and a strong foundation in Python and PyTest to lead quality strategy across our AI Model Serving offerings. You'll work in a highly collaborative team responsible for one of the core capabilities of OpenShift AI, contributing to open-source projects such as KServe, Kubeflow, and vLLM. Your work will directly impact enterprises running mission-critical AI workloads on hybrid and multi-cloud environments. This role requires not only excellent engineering skills but also strategic vision, thought leadership, and the ability to mentor and influence others across multiple teams.

  • Lead the quality strategy and implementation for Kubernetes-native components in Model Serving, including Custom Resources, Controllers, and Operators.
  • Own and evolve automated test architecture with a focus on PyTest, CI/CD, integration testing, and end-to-end testing in Kubernetes environments.
  • Partner with engineering, product, and community teams to define testability requirements, ensure early validation, and prevent regressions.
  • Design tests that validate system-level properties including scalability, autoscaling, observability, and reliability for AI workloads.
  • Participate and influence upstream communities (KServe, Kubeflow, ModelMesh, etc.), raising quality standards and sharing best practices.
  • Drive efforts to mock, simulate, and validate model serving use cases in hybrid cloud and on-prem environments.
  • Serve as a technical mentor and go-to expert for Python-based testing frameworks and Kubernetes-native validation strategies.
  • Take a lead role in debugging complex system-level issues, especially in multi-tenant, distributed AI systems.
  • Champion Shift-left testing and early validation practices across the RHOAI stack.
  • Proven expertise with Kubernetes API development and testing (CRs, Operators, Controllers). Experience working directly with Custom Resources and reconciliation logic is essential.
  • Strong programming and testing experience in Python, especially with PyTest in large, scalable codebases. Golang knowledge is a plus.
  • Deep understanding of Kubernetes internals, networking, and lifecycle hooks. Experience with OpenShift is a plus.
  • Extensive knowledge of CI/CD pipelines, especially in containerized or cloud-native ecosystems (e.g., GitHub Actions, Tekton, Jenkins).
  • Strong knowledge of test strategy for ML model serving systems, including considerations for runtime performance, isolation, and failure recovery.
  • Experience with troubleshooting distributed systems and validating observability via Prometheus, Grafana, OpenTelemetry, etc.
  • A proven ability to lead technical projects and mentor others across teams and time zones.
  • Excellent communication skills and comfort presenting to engineers, managers, and external stakeholders.
  • Hands-on experience with KServe, ModelMesh, Ray, vLLM, or other model serving frameworks.
  • Familiarity with Red Hat Service Mesh, Istio, Knative, or similar serverless/K8s-native middleware stacks.
  • Experience with performance/load testing frameworks and chaos testing in Kubernetes.
  • Contribution history in open-source projects or technical leadership in community forums.
  • Comprehensive medical, dental, and vision coverage
  • Flexible Spending Account - healthcare and dependent care
  • Health Savings Account - high deductible medical plan
  • Retirement 401(k) with employer match
  • Paid time off and holidays
  • Paid parental leave plans for all new parents
  • Leave benefits including disability, paid family medical leave, and paid military leave
  • Additional benefits including employee stock purchase plan, family planning reimbursement, tuition reimbursement, transportation expense account, employee assistance program, and more!
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service