Site Reliability Engineer

Booz Allen Hamilton•Usa, DC

1d•$99,000 - $225,000

About The Position

Site Reliability Engineer The Opportunity: Engineering to make a system more resilient and efficient frees up time and money to build more capabilities. Whether you come from a background in network engineering, systems administration, or sof tware development, if you have a passion for making systems better, we need you! As a site reliability engineer on our team, you’ll lead the development of more robust systems by building a resilient infrastructure. You’ll build in redundancy, implement monitoring tools, and automate wherever possible. You’ll reduce toil by scripting routine tasks and automating self-repair. This is your chance to leverage your expertise in cloud technologies while supporting your team of engineers and acting as a subject matter expert for your clients. Work with us as we help deliver a scalable, secure, and intelligent payment ecosystem that meets modernization goals and public expectations for transparency and service quality. Join us. The world can’t wait.

Requirements

2+ years of experience leading teams
Experience deploying, maintaining, or troubleshooting complex applications at an enterprise scale
Experience with CloudWatch, Clou dTra il, Splunk / ITSI, and Pager Duty
Experience working in Unix or Linux, AWS, SaaS, and PaaS implementation
Ability to obtain and maintain a Public Trust or Suitability/Fitness determination based on client requirements
Master’s degree in CS, Engineering, or IT and 8+ years of experience working with key indicators for IT system operability, reliability, application performance, or code quality, or 10+ years of experience working with key indicators for IT system operability, reliability, application performance, or code quality in lieu of a degree

Nice To Haves

Experience with test-driven development, distributed systems, microservices, and cloud-native application implementation
Experience with CI / CD, including GitLab Runners, GitHub Actions, and Jenkins, Git, and system administration
Experience working in an Agile framework, including Kanban and Scrum
Possession of excellent written and verbal communication skills
Possession of excellent critical-thinking and error assessment skills

Responsibilities

lead the development of more robust systems by building a resilient infrastructure
build in redundancy
implement monitoring tools
automate wherever possible
reduce toil by scripting routine tasks and automating self-repair
supporting your team of engineers and acting as a subject matter expert for your clients