Site Reliability Engineer (SRE) (TS)

Koniag Government Services, LLCWashington, DC
$158,000 - $178,000Onsite

About The Position

Koniag Management Solutions, LLC (KMS), a Koniag Government Services (KGS) company, is hiring a Site Reliability Engineer (SRE). Position requires an active Top Secret/SCI clearance with ability to obtain additional security requirements. We are seeking an experienced Site Reliability Engineer (SRE) to blend software engineering and systems administration practices to ensure the reliability, availability, and performance of mission‑critical applications. This role focuses on automation, observability, and incident response while upholding strict Service Level Objectives (SLOs). The SRE will help build resilient systems that scale, automate manual processes, manage fleet‑wide configurations, and ensure robust system monitoring. The selected candidate will support operations at Joint Base Anacostia–Bolling and must maintain an active TS/SCI clearance.

Requirements

  • TS/SCI security clearance required, candidate will not be considered without.
  • Security +
  • Cloud Associate (such as AWS Solutions Architect Associate, Azure AZ‑104, or Google Cloud Associate Cloud Engineer)
  • Terraform Associate
  • Cloud Professional/Architect (such as AWS Solutions Architect Professional or Azure Architect Expert)
  • CKA (Certified Kubernetes Administrator)
  • Strong understanding of Kubernetes, Rancher, Helm, Docker
  • Strong understanding of Cilium, Rook, Ceph, MinIO, S3, PortWorx
  • Strong understanding of Load balancing, ingress, and service networking
  • Strong understanding of Ansible, Terraform, Desired State Configuration
  • Strong understanding of Python, PowerShell, and scripting/automation
  • Strong understanding of Distributed systems, cloud computing, and microservices architecture
  • Strong understanding of Monitoring/observability practices and tools
  • Strong understanding of Incident response frameworks and SLO-based operations

Nice To Haves

  • CKA (if not used to meet required cert)
  • RHCSA
  • AWS DevOps Engineer or AZ‑400
  • CCSP
  • Advanced observability certifications (Datadog, New Relic, Dynatrace, etc.)
  • Formal incident management or SRE‑focused training
  • Building scalable, fault-tolerant cloud-native systems across hybrid or multi‑cloud environments.
  • Developing or supporting enterprise CI/CD pipelines.
  • Managing complex Kubernetes clusters across on‑prem and cloud platforms.
  • Implementing enterprise observability stacks (e.g., Prometheus, Loki, Grafana, ELK, Open Telemetry).
  • Supporting large-scale infrastructure within DoD or Intelligence Community environments.

Responsibilities

  • Ensure application reliability, performance, and availability through automation, monitoring, and systems engineering.
  • Develop infrastructure-as-code (IaC) solutions using Terraform, Ansible, and Desired State Configuration (DSC).
  • Build and manage containerized workloads using Kubernetes, Rancher, Docker, Helm, and related ecosystem tools.
  • Support service mesh and networking constructs such as Cilium, load balancing, ingress management, and distributed storage.
  • Engineer and maintain storage and object systems including Rook, Ceph, MinIO, and S3-compatible platforms.
  • Implement and maintain comprehensive observability platforms (metrics, logging, tracing) to support SLO monitoring and incident response.
  • Lead and participate in incident response activities, postmortem analysis, and reliability engineering improvements.
  • Develop automations, scripts, and tools using Python, PowerShell, and shell scripting.
  • Support CI/CD pipelines and cloud-native deployment methodologies.
  • Collaborate with development and operations teams to embed SRE practices into the application lifecycle.

Benefits

  • health, dental and vision insurance
  • 401K with company matching
  • flexible spending accounts
  • paid holidays
  • three weeks paid time off
© 2026 Teal Labs, Inc
Privacy PolicyTerms of Service