Kubernetes Platform Engineer

Bay Systems Consulting Inc.Berkeley, CA
Onsite

About The Position

We are seeking a Kubernetes Platform Engineer to join the Platform Engineering team as a hands-on individual contributor. This role focuses on day-to-day operations and administration of Kubernetes clusters, primarily on-premises (K3s/RKE2) with additional support for cloud environments on Google Cloud Platform (GCP) and Amazon Web Services (AWS). You will manage cluster lifecycle operations, implement and maintain Cilium-based networking, troubleshoot complex platform issues, and enable development teams to successfully deploy and operate their workloads. This position balances infrastructure operations with developer enablement, requiring both deep technical expertise and strong collaboration skills. The Platform Engineering team is a small team within ESnet's Systems and Software department that is dedicated to streamlining the software development lifecycle by establishing standardized processes for building, configuring, and deploying applications. The team supports the engineering, implementation, and maintenance of ESnet's platform systems and services including GitLab, Ansible, and Kubernetes environments, with responsibility for both on-premises and cloud-based services deployed across Google Cloud Platform (GCP) and Amazon Web Services (AWS).

Requirements

  • Typically requires a minimum of 8 years of related experience with a Bachelor’s degree; or 6 years and a Master’s degree; or equivalent experience.
  • Demonstrated experience administering Kubernetes on on-premises infrastructure (K3s, RKE2, or similar bare-metal distributions)
  • Experience with cloud-managed Kubernetes (GKE and/or EKS)
  • Strong understanding of Linux networking fundamentals: iptables/nftables, routing tables, DNS, TCP/IP stack, network troubleshooting
  • Experience with GitOps methodologies and tools such as ArgoCD or Flux
  • Proficiency in scripting and automation: Bash, Python, Go
  • Cilium CNI or equivalent production experience
  • Ability to work collaboratively in a team environment and communicate technical concepts clearly
  • Understanding of Kubernetes security best practices including Pod Security Standards, RBAC, and secrets management
  • GCP (Google Cloud Platform) and/or AWS (Amazon Web Services) cloud platform experience

Nice To Haves

  • Go programming experience for operator maintenance and platform tooling development
  • CKA (Certified Kubernetes Administrator) or CKS (Certified Kubernetes Security Specialist) certification
  • Background in BGP routing protocols and network engineering concepts
  • IPv6 networking experience
  • Infrastructure as Code experience with Terraform or Ansible
  • Experience with internal developer platform (IDP) tools such as Backstage or similar
  • Experience with service mesh technologies (Istio, Linkerd)
  • Excellent understanding of code review and familiarity with GitHub and GitLab workflows

Responsibilities

  • Manage the full lifecycle of Kubernetes clusters (on-premises K3s/RKE2, GKE, and EKS), including upgrades, security patching, scaling, and capacity planning
  • Troubleshoot cluster-level issues including control plane problems, node failures, and resource constraints
  • Implement and maintain cluster security hardening based on CIS benchmarks and organizational security policies
  • Manage etcd cluster health, backup procedures, and disaster recovery capabilities
  • Monitor cluster performance and optimize resource utilization across multi-tenant workloads
  • Coordinate with datacenter operations team for physical infrastructure changes and maintenance windows
  • Implement, configure, and maintain Cilium CNI across on-premises and cloud Kubernetes environments
  • Design and enforce network policies to achieve secure multi-tenant isolation
  • Troubleshoot complex pod networking issues including DNS resolution, service discovery, and connectivity problems
  • Configure and maintain BGP peering with physical network infrastructure for on-premises integration
  • Work with network engineering team on firewall rules, VLANs, IPv6 networking, and network architecture
  • Contribute to building a next-generation internal developer platform inspired by tools like Backstage, focused on increasing development efficiency and security
  • Work with the security team to define secure image baselines and automate the patching pipeline for container images
  • Assist development teams with deploying, configuring, and troubleshooting Kubernetes workloads
  • Review application deployment manifests and provide guidance on best practices and optimization
  • Develop and maintain platform documentation, runbooks, and self-service guides
  • Engage with development teams to understand platform needs and tailor the cluster experience to meet evolving requirements
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service