Senior Site Reliability Engineer

Steampunk•McLean, VA

31d

About The Position

Design. Disrupt. Repeat. Be an agent of change on a team committed to achieving client-focused, mission-driven excellence. Steampunk is looking for an experienced Site Reliability Engineer with an appetite for taking on new challenges. Contributions As a Sr. Steampunk Site Reliability Engineer, you will be responsible for working with program development teams, infrastructure and platform services teams, and traditional operations and maintenance teams to embrace and embody a shared responsibility for the reliability of an organizations’ applications and infrastructure. As an SRE, your primary responsibility is to combine aspects of software engineering with traditional operations to maintain and improve the reliability, availability, and performance of cloud, infrastructure, and large-scale software systems and services while minimizing downtime and mitigating potential failures.

Requirements

Bachelor's degree and 10 years of IT experience, Master's degree and 8 years of experience, No degree and 14 years of experience
Eligibile to obtain and maintain a government security clearance with Department of Commerce
Must possess minimum 3+ years of actual experience in the industry in an SRE role
Must possess minimum 10+ years of software engineer experience with skills in Angular, Node, Java, Python etc.
Knowledge and experience with Agile and DevSecOps methodologies
Experience with the following software/tools: Source code and binary repository products and techniques (GitHub, GitLab, BitBucket, Artifactory, Nexus, etc.)
Infrastructure and Cloud Management tools such as AWS CloudWatch
Log Management and Analysis tools such as Splunk
Automation and Configuration Management tools such as Terraform or Puppet

Nice To Haves

Knowledge and experience with NewRelic and/or other AIOps platforms
Have programming skills – Javascript, Ruby and/or Go
Experience with Nginx, HAProxy, Docker, Kubernetes or similar technologies
Experience with messaging systems, collaboration software, application-based firewall and proxy server(s), and operating systems
Experience with Linux and Windows operating systems, along with scripting tools and techniques such as Bash, CSH, KSH, ZSH, etc. and/or Powershell.
Experience with Monitoring and Alerting tools such as Prometheus, Grafana and Datadog

Responsibilities

working with program development teams, infrastructure and platform services teams, and traditional operations and maintenance teams to embrace and embody a shared responsibility for the reliability of an organizations’ applications and infrastructure.
combine aspects of software engineering with traditional operations to maintain and improve the reliability, availability, and performance of cloud, infrastructure, and large-scale software systems and services while minimizing downtime and mitigating potential failures.

Stand Out From the Crowd

Upload your resume and get instant feedback on how well it matches this job.

Upload and Match Resume