Platform Operations Engineer

RTXMarlborough, MA
Hybrid

About The Position

The Platform Reliability Engineer will report to the Digital Infrastructure Services organization and support the design, implementation, and maintenance of enterprise-wide orchestration and container management platforms based on Kubernetes. These platforms support program software, solutions, and products across the organization. The Raytheon Orchestration and Container Kubernetes Service (ROCKS) team provides a standardized container management platform built on Kubernetes. ROCKS supports: A secure, enterprise container orchestration foundation for the Digital Ecosystem Deployment in air-gapped classified environments Non-production services in shared unclassified environments Integration across cloud and on-premises environments

Requirements

  • Typically, requires Bachelor’s in science, Technology, Engineering, or Mathematics (STEM) or equivalent experience and a minimum of 5 years prior relevant experience, or An Advanced Degree in a related field and a minimum of 3 years experience.
  • Experience installing, deploying, monitoring, and supporting Kubernetes clusters in on-premises and cloud environments
  • Experience with Kubernetes platforms including Rancher RKE2, upstream Kubernetes, OpenShift Container Platform, VMware RKE/Tanzu, or similar Kubernetes distributions
  • Experience with Kubernetes-related tools and technologies including Terraform, Helm, Python, Go, and Bash
  • Experience with observability and monitoring tools including Grafana, Prometheus, Alert manager, and Loki
  • The ability to obtain and maintain a US security clearance.
  • U.S. citizenship is required as only U.S. citizens are eligible for a security clearance

Nice To Haves

  • Advanced knowledge of Kubernetes architecture, operations, and supporting tools
  • Experience deploying, configuring, maintaining, and supporting Kubernetes clusters
  • Experience in cloud and hybrid environments including AWS GovCloud, Azure Government, VMware, bare metal, and restricted networks
  • Experience with highly available and resilient cluster design and operations
  • Experience implementing observability, monitoring, and alerting solutions for distributed systems
  • Deploying cloud native platforms and systems in classified and unclassified environments
  • Designing and operating scalable, secure, high-performance systems, platforms, and Kubernetes clusters
  • Working with VMware, AWS GovCloud, and Azure Government environments
  • Oral and written communication
  • Executing projects within schedules and budgets
  • Translating business and functional requirements into technical requirements and tasks
  • Documenting and diagramming technical systems
  • Working with CNCF Kubernetes components including service mesh, service discovery, package management, observability and monitoring, runtimes, and security
  • Working with GitOps and Kubernetes package management tools including ArgoCD, Packer, Helm, and Kustomize
  • Working in Agile environments with product owners and scrum masters
  • Implementing Kubernetes in air-gapped and regulated network environments
  • Root cause analysis of distributed system failures
  • Monitoring CNCF ecosystem developments and applying technologies to Kubernetes platforms

Responsibilities

  • Implement, support, and optimize Kubernetes-based container orchestration platforms across both unclassified and closed-area systems
  • Collaborate with engineering, program teams, and cross-functional partners to improve platform usage and identify enhancements
  • Diagnose and resolve complex Kubernetes-related issues in partnership with internal teams and stakeholders
  • Support escalation and resolution of higher-severity platform issues in alignment with established processes
  • Develop and enhance observability and monitoring capabilities to support error detection, defect reduction, and improved system performance
  • Improve Mean Time to Detect (MTTD), Mean Time to Resolve (MTTR), service availability, and customer experience
  • Implement and maintain monitoring tools, dashboards, and alerting systems aligned with operational best practices
  • Work with infrastructure, networking, and application teams to forecast capacity, scaling requirements, and system demand

Benefits

  • parental (including paternal) leave
  • flexible work schedules
  • achievement awards
  • educational assistance
  • child/adult backup care
  • medical
  • dental
  • vision
  • life insurance
  • short-term disability
  • long-term disability
  • 401(k) match
  • flexible spending accounts
  • employee assistance program
  • Employee Scholar Program
  • paid time off
  • holidays
© 2026 Teal Labs, Inc
Privacy PolicyTerms of Service