Site Reliability Engineer, Senior

Booz Allen Hamilton•Washington, DC

About The Position

Site reliability engineering requires a powerful blend of engineering depth, operational focus, and systems thinking. As a Senior Site Reliability Engineer, you understand how to embed reliability into every stage of the engineering lifecycle—ensuring that systems are observable, predictable, and resilient. We’re looking for an SRE like you to elevate platform reliability and ensure financial systems consistently meet user expectations and regulatory demands. As a Senior Site Reliability Engineer at Booz Allen, you’ll leverage your expertise in SLIs/SLOs, automation, monitoring, and cloud-native architectures to safeguard and enhance mission‑critical financial systems. You’ll help engineer and operate secure, scalable, highly resilient cloud platforms that meet rigorous performance, compliance, and security requirements. Join us and ensure seamless operation, continuous improvement, and operational excellence across the financial systems that millions depend on. Join us. The world can’t wait.

Requirements

5+ years of experience in Site Reliability Engineering, cloud operations, DevOps, or large‑scale production systems support
Experience designing, implementing, and measuring SLIs, SLOs, SLAs, and error budgets for distributed systems
Experience improving operational metrics, including MTTR, MTTD, MTTF, and performing root‑cause analysis and post‑incident reviews
Experience administering, scaling, and troubleshooting applications in AWS
Experience with containers and orchestration, such as Docker or Kubernetes
Experience with scripting and automation using Python, Bash, or Go
Experience with infrastructure as code
Experience with observability and monitoring tools, such as Prometheus, Grafana, CloudWatch, OpenTelemetry, Splunk, or Elastic
Experience supporting CI/CD pipelines, such as GitHub Actions
Knowledge of cloud security, compliance, and operational best practices for production systems
Public Trust
Bachelor's degree

Nice To Haves

Experience with Terraform
Experience with secure or regulated cloud environments
Experience supporting or modernizing government or financial systems
Experience with event-driven or streaming platforms, such as Kafka, Confluent, or AWS MKS
Experience with Chaos Engineering, reliability testing, or fault‑injection practices
Experience operating in multi‑cloud or hybrid environments
AWS, Azure, or GCP Associate or Professional level Certification

Responsibilities

Leverage expertise in SLIs/SLOs, automation, monitoring, and cloud-native architectures to safeguard and enhance mission‑critical financial systems.
Help engineer and operate secure, scalable, highly resilient cloud platforms that meet rigorous performance, compliance, and security requirements.
Ensure seamless operation, continuous improvement, and operational excellence across financial systems.