Xcel Engineering-posted 3 days ago
Full-time • Mid Level
Onsite • Oak Ridge, TN

XCEL Engineering is seeking a qualified applicant for a Kubernetes Principal Engineer/ Architect. As a Platform Engineer, you will architect, implement, and maintain the infrastructure underpinning our on-premises Kubernetes clusters, with a strong focus on scalability, reliability, and maintainability. You will lead the technical direction of our platform engineering initiatives, evaluate and integrate key technologies, and deliver a robust internal platform that powers development across the organization.

  • Platform Architecture & Implementation Lead the design and technical implementation of on-premises Kubernetes clusters that replace and improve upon features previously provided by OpenShift.
  • Select, evaluate, and integrate critical components for networking, CI/CD tooling, OS management, service mesh, and Kubernetes operators-excluding observability, which is handled by a dedicated SRE sub-team.
  • Build test environments to evaluate tooling based on performance, feature set, and maintainability-especially for components that must work reliably with on-premise hardware and OS requirements.
  • Own upgrades, security hardening, monitoring integration, and scalability of all cluster infrastructure.
  • Write and maintain infrastructure and deployment code using tools such as ArgoCD (GitOps), Puppet (OS management), Go, Python, Bash, and GitLab CI.
  • Support the use and understanding of in-house Kubernetes operators and serve as a secondary maintainer for those controllers.
  • Collaborate on building a next-generation internal developer platform inspired by tools like Backstage or AWS Proton, focused on increasing development efficiency and security.
  • Work with the cybersecurity team to define secure image baselines and automate the patching pipeline for container images and golden base layers.
  • Engage with development teams to understand platform needs and tailor the cluster experience to meet evolving requirements.
  • Provide architectural guidance, code reviews, and pair programming support to a team of 8-12 engineers.
  • Contribute to onboarding, team documentation, and process improvement initiatives.
  • Act as a go-to technical expert for all Kubernetes platform questions across the engineering organization.
  • Partner closely with internal cybersecurity and development teams to ensure the platform meets security, compliance, and usability expectations.
  • Participate in cross-functional projects related to platform enhancements and cluster lifecycle automation.
  • Be able to represent the Platforms team with vendors and both internal and external collaborators and partners.
  • Bachelor's Degree in computer science or closely related field and a minimum of 8 years as a Platforms engineer.
  • At least 5 years of Kubernetes experience.
  • An equivalent combination of education and experience may be considered.
  • The ability to obtain and maintain a Department of Energy "Q" clearance is required.
  • This requires US Citizenship.
  • Languages: Go, Python, Bash
  • CI/CD: GitLab CI, ArgoCD
  • IaC/Config Management: Puppet, Helm
  • Kubernetes & Ecosystem: On-prem K8s, Custom Operators, Service Mesh
  • Operating Systems: Linux-based OS management at the hardware level
  • Excellent interpersonal/communication skills, and the ability to work as part of a team.
  • Strong working knowledge of Unix system fundamentals and common network protocols.
  • Experience managing Linux/UNIX operating systems in a heterogeneous environment.
  • Solid understanding of networked computing environment concepts.
  • Excellent understanding of networking, particularly Linux and Kubernetes networking
  • Experience with instrumenting bare metal and VMWare infrastructure
  • Ability to develop and maintain programs and scripts that aid in the operation and automation using various shell (primarily bash) and high-level languages (Python or Go).
  • Ability to proactively identify performance issues, problems, and areas for improvement.
  • Ability to identify requirements and to define, plan, and implement requisite solutions.
  • Ability to plan, organize, prioritize tasks, and complete assigned projects with minimal supervision.
  • Experience with continuous integration and continuous deployment software methodologies
  • An understanding of code review and familiarity with tools like GitHub and GitLab
  • Experience using tools such as Nagios, Grafana and Prometheus to monitor systems, metrics, and create dashboards.
  • Experience designing and implement highly available systems/services utilizing virtual machines and Kubernetes resources.
  • Experience participating in an opensource community with patches accepted upstream.
  • Experience deploying and maintaining automated configuration management software such as Puppet or Ansible
  • Experience implementing systems-level security technologies like SELinux and following security best practices.
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service