Kubernetes Site Reliability Engineer

The Aerospace Corporation•El Segundo, CA

10d•$129,000 - $193,500•Onsite

About The Position

The Aerospace Corporation is the trusted partner to the nation’s space programs, solving the hardest problems and providing unmatched technical expertise. As the operator of a federally funded research and development center (FFRDC), we are broadly engaged across all aspects of space— delivering innovative solutions that span satellite, launch, ground, and cyber systems for defense, civil and commercial customers. When you join our team, you’ll be part of a special collection of problem solvers, thought leaders, and innovators. Join us and take your place in space. The Digital Innovation Division (DID) is accountable for integrating strategies, providing governance, and managing internal investments that form the foundation of Aerospace’s digital innovation and transformation. The DID Mission IT pillar supports engineering teams across Aerospace by delivering top-tier IT engineering and IT support services tailored to meet the unique needs of our engineering community. Mission IT Operations is seeking a skilled Site Reliability Engineer with deep expertise in Kubernetes, Linux, programming, and automation. In this role, you will be responsible for developing and maintaining both on-premises and cloud-based Kubernetes clusters that form the core of an overall Platform as a Service (PaaS), providing essential support to our engineering team. As part of a multidisciplinary platform and infrastructure team, you will manage multiple Kubernetes clusters used for technical analyses such space launch telemetry analysis and modeling and simulation, as well as for Artificial Intelligence (AI) Large Language Model (LLM) Training and inference services. Collaborating closely with rocket scientists and engineers, you will contribute to the development of innovative solutions to complex challenges within the space enterprise, supporting critical national space assets. This position requires a strong sense of shared responsibility and ownership, working alongside cross-functional team members to achieve our mission objectives.

Requirements

Bachelor’s degree in STEM, Computer Science. or other related sciences/engineering discipline.
8 or more years of relevant experience directly related to developing and delivering complex large-scale distributed software systems solutions and technical products
Minimum of 5 years experience supporting highly available enterprise environments, including maintaining system uptime and service availability targets.
At least 2 years of hands-on experience managing existing Kubernetes environments, with responsibilities of deployment, upgrade, patching, and backups
Full ownership and engineering responsibility of production Kubernetes services, both on-premises and Cloud Service Providers such as AWS and Azure
Ability to identify and resolve engineering problems independently
Experience in Linux systems administration, including configuration, for an enterprise environment
Strong understanding of networking and storage fundamentals
Experience automating repetitive tasks with scripting or DevOps tools
Ability to obtain a TS/SCI security clearance and polygraph, which is issued by the U.S. government.
U.S. citizenship is required to obtain a security clearance.
12 or more years of relevant experience directly related to developing and delivering complex large-scale distributed software systems solutions and technical products
8 years of experience supporting a highly available enterprise environment
Experience architecting and deploying secure cloud (e.g., AWS, Azure) and/or Kubernetes environments from scratch
Experience performance tuning and capacity planning cloud (e.g., AWS, Azure) and/or Kubernetes environments

Nice To Haves

A current and active U.S. Government TS/SCI security clearance and polygraph
Certified Kubernetes Administrator (CKA), Red Hat Certified System Administrator (RHCSA), or Red Hat Certified Engineer (RHCE)
Experience managing Kubernetes clusters using Rancher
Experience deploying/supporting persistent container storage on Kubernetes (i.e., Portworx, Rook Ceph, OpenEBS, Longhorn)
Experience in Linux performance tuning, and security hardening for an enterprise DoW environment
Experience with VMware or Harvester virtualization infrastructures
Experience with automated provisioning, configuration management, Infrastructure-as-Code, GitOps (i.e., ArgoCD, Ansible, TerraForm, Puppet, Packer, Bash, Golang, Python)
Experience with Agile and Scrum

Responsibilities

Developing and sustaining advanced services for our Kubernetes-based PaaS (e.g., Coder workspaces, Kueue batching scheduling, Knative serverless, Crossplane control planes)
Managing Kubernetes for production on-premises and cloud environments (e.g., AWS, Azure) with end-to-end responsibilities of deployment, upgrade, patching, performance tuning, capacity planning, and backups/DR.
Frequent full security patching of all layers of Kubernetes infrastructure while maintaining very high uptime
Ownership and engineering responsibility of production AWS and Kubernetes services
Identifying and resolving full Kubernetes stack engineering problems independently
Ensuring successful real-time analysis of telemetry data from space launch partners, such as SpaceX, United Launch Alliance (ULA), and Blue Origin
Providing after-hours support for Kubernetes infrastructure troubleshooting during launch events
Supporting scientists and engineers running applications in Kubernetes
Providing Linux expertise and troubleshooting
Evaluating and testing new products and technologies
Using code to enhance and automate operations

Benefits

Comprehensive health care and wellness plans
Paid holidays, sick time, and vacation
Standard and alternate work schedules, including telework options
401(k) Plan — Employees receive a total company-paid benefit of 8%, 10%, or 12% of eligible compensation based on years of service and matching contributions; employees are immediately eligible and vested in the plan upon hire
Flexible spending accounts
Variable pay program for exceptional contributions
Relocation assistance
Professional growth and development programs to help advance your career
Education assistance programs
An inclusive work environment built on teamwork, flexibility, and respect