Site Reliability Engineer

ArctiqNorfolk, VA

About The Position

The Site Reliability Engineer will focus on the execution and maintenance of reliability engineering practices for mission-critical government systems. Following the SRE Implementation Plan, you will bridge the gap between development and operations by applying a software engineering mindset to system administration. You will be responsible for building automation, maintaining CI/CD pipelines, and ensuring system health through robust monitoring.

Requirements

  • 3–5 years of experience in SRE, DevOps, or Systems Engineering roles.
  • Proficiency in scripting languages (Python, Go, or Bash).
  • Hands-on experience with containerization (Docker, Kubernetes) and cloud platforms (AWS, Azure, or GCP).
  • Familiarity with NIST SP 800-53 security controls.
  • Bachelor’s degree in Computer Science or a related technical field.

Responsibilities

  • Implement and maintain dashboards and alerting rules using Prometheus, Grafana, or ELK Stack.
  • Support the identification of Service Level Indicators (SLIs).
  • Develop and maintain Infrastructure as Code (IaC) scripts using Terraform and Ansible to ensure repeatable, error-free deployments.
  • Maintain automated deployment pipelines, ensuring security scans and automated tests are integrated into the workflow.
  • Participate in on-call rotations and assist in troubleshooting system outages.
  • Contribute to blameless post-mortem reports to drive continuous improvement.
  • Identify repetitive manual tasks and develop automation to reduce "toil," allowing the team to focus on high-value engineering.
© 2026 Teal Labs, Inc
Privacy PolicyTerms of Service