Software Engineer (Kubernetes)

Xcel EngineeringOak Ridge, TN
12h

About The Position

XCEL Engineering is seeking a qualified applicant for a Software Engineer (Kubernetes) to design and develop custom Kubernetes Operators that extend the orchestration of high-performance workloads and secure data workflows at scale. These roles are central to enabling American Science Cloud's AI and HPC platforms, ensuring that containerized research applications run seamlessly across heterogeneous compute and data environments.

Requirements

  • United States citizen with the ability to obtain a security clearance.
  • Bachelor's degree in Computer Science, Information Technology or a related technical field.
  • Experience with the following key technologies and tools:
  • Languages: Go, Python, Bash
  • CI/CD: GitLab CI, ArgoCD
  • IaC/Config Management: Puppet, Helm, Ansible
  • Kubernetes & Ecosystem: On-prem K8s, Custom Operators, Service Mesh, k8s architecture
  • Operating Systems: Linux-based OS management at the hardware level, strong Linux sysadmin skills

Nice To Haves

  • Prior Istio operator development or service mesh integration experience.
  • Familiarity with WebAssembly plugin development for Istio or Kubernetes.
  • Background in HPC platforms, GPU-based AI training environments, or large-scale distributed systems.
  • Exposure to DOE computing ecosystems (ALCF, OLCF, NERSC, ESnet, HPDF).
  • Experience with containerized scientific workflows and secure data-sharing architectures.

Responsibilities

  • Custom Kubernetes operator development
  • Design, implement, maintain, modify, and test custom Kubernetes operators written in Go and/or Ansible
  • Enhance existing software development processes, practices, and standards.
  • test environments to evaluate tooling based on performance, feature set, and maintainability-especially for components that must work reliably with on-premise hardware and OS requirements.
  • Support the use and understanding of in-house Kubernetes operators and serve as a maintainer for those controllers.
  • Architecture & Infrastructure as Code and Tooling
  • Develop and implement an Architecture as Code process for the Slate platform
  • Write and maintain infrastructure and deployment code using tools such as ArgoCD (GitOps), Puppet (OS management), Go, Python, Bash, Ansible, Terraform, and GitLab CI.
  • Engage with development teams to understand platform needs and tailor the cluster experience to meet evolving requirements.
  • Technical Leadership for Software Engineering
  • Provide software development, guidance, code reviews, and pair programming support to a team of 11 engineers.
  • Contribute to onboarding, team documentation, and process improvement initiatives.
  • Act as a go-to technical expert for all Kubernetes custom operator questions across the engineering organization.
  • Collaboration
  • Partner closely with internal cybersecurity and development teams to ensure the platform custom operators meets security, compliance, and usability expectations.
  • Participate in cross-functional projects related to platform enhancements, cluster lifecycle automation and infrastructure provisioning.
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service