About The Position

Design, automate, deploy, and operate highly reliable cloud systems supporting mission-critical workloads for U.S. Government customers. This role is centered on DevSecOps and site reliability engineering, with a strong emphasis on deployment automation, operational stability, and system resilience across AWS GovCloud and AWS C2E environments. You will be responsible for the reliability and operability of Quindar’s platform in production, ensuring systems are observable, fault-tolerant, and require minimal manual intervention. Your work will directly impact mission success by improving system uptime, deployment velocity, and operational confidence in constrained and classified environments. A key focus of this role is building and evolving automated deployment pipelines, hardened runtime environments, and repeatable infrastructure patterns that support secure and scalable operations in regulated environments. You will also support and improve Quindar deployments to air-gapped networks, driving consistency, reliability, and performance across all environments. As the organization grows, you will help define and implement best practices for availability, latency, incident response, and service-level objectives (SLOs). This role includes participation in incident response and a 24/7 on-call rotation, with a strong mandate to eliminate toil through automation and continuously improve system reliability. You will collaborate closely with frontend, backend, and platform engineers to ensure systems meet performance, reliability, and mission assurance requirements.

Requirements

  • Strong experience with Kubernetes and containerized workloads in production environments
  • Hands-on experience operating clusters in AWS EKS, Rancher, or similar platforms
  • Experience supporting GovCloud, IL-enclave, or C2E environments
  • Deep experience with CI/CD systems and deployment automation (GitLab preferred)
  • Proficiency in Python and Infrastructure-as-Code tools (Terraform or similar)
  • Experience with observability platforms (Grafana LGTM stack, Datadog, or equivalent)
  • Strong understanding of distributed systems, APIs, databases, caching, and event-driven architectures
  • Solid networking fundamentals (VPCs, VPNs, load balancers, TLS, service connectivity)
  • Experience with Linux/Unix systems
  • Familiarity with cloud security best practices, enclave boundaries, and secure system design
  • Experience with identity and access management (AWS IAM, Auth0, Keycloak, ICAM patterns)
  • Strong Git fundamentals and experience supporting deployments across multiple classification levels
  • Bachelor’s degree in Computer Science or related field
  • 3+ years of professional experience as an SRE, DevOps, reliability, infrastructure, or platform engineer
  • Active U.S. Security Clearance (Secret or higher required; TS/SCI preferred); U.S. Citizenship required

Nice To Haves

  • Experience working toward ATO/authorization in federal, DoD, or IC environments preferred
  • Experience supporting deployments in GovCloud, C2S/C2E, or IL-enclave environments highly desirable

Responsibilities

  • Design, automate, deploy, and operate highly reliable cloud systems
  • Ensure systems are observable, fault-tolerant, and require minimal manual intervention
  • Build and evolve automated deployment pipelines, hardened runtime environments, and repeatable infrastructure patterns
  • Support and improve Quindar deployments to air-gapped networks
  • Define and implement best practices for availability, latency, incident response, and service-level objectives (SLOs)
  • Participate in incident response and a 24/7 on-call rotation
  • Collaborate closely with frontend, backend, and platform engineers

Benefits

  • We take work life balance very seriously. We require employees to take 15 days off but provide unlimited PTO and follow most US federal government holidays.
  • Mental health is just as important as physical so we provide quarterly health & wellness benefits.
  • Comprehensive health insurance for you and your family with 100% coverage for employees.
  • We encourage employees to save for retirement and provide 4% 401(k) matching.
  • Annually we have a 4-day company offsite. Previous locations include San Francisco, Nashville, Denver, Santa Fe, New Orleans, San Diego, Bozeman, and New York City.
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service