Senior Site Reliability Engineer

SteampunkMcLean, VA
2d

About The Position

Design. Disrupt. Repeat. Be an agent of change on a team committed to achieving client-focused, mission-driven excellence. Steampunk is looking for an experienced Site Reliability Engineer with an appetite for taking on new challenges. Contributions As a Sr. Steampunk Site Reliability Engineer, you will be responsible for working with program development teams, infrastructure and platform services teams, and traditional operations and maintenance teams to embrace and embody a shared responsibility for the reliability of an organizations’ applications and infrastructure. As an SRE, your primary responsibility is to combine aspects of software engineering with traditional operations to maintain and improve the reliability, availability, and performance of cloud, infrastructure, and large-scale software systems and services while minimizing downtime and mitigating potential failures.

Requirements

  • Bachelor's degree and 10 years of IT experience, Master's degree and 8 years of experience, No degree and 14 years of experience
  • Eligibile to obtain and maintain a government security clearance with Department of Commerce
  • Must possess minimum 3+ years of actual experience in the industry in an SRE role
  • Must possess minimum 10+ years of software engineer experience with skills in Angular, Node, Java, Python etc.
  • Knowledge and experience with Agile and DevSecOps methodologies
  • Experience with the following software/tools: Source code and binary repository products and techniques (GitHub, GitLab, BitBucket, Artifactory, Nexus, etc.)
  • Infrastructure and Cloud Management tools such as AWS CloudWatch
  • Log Management and Analysis tools such as Splunk
  • Automation and Configuration Management tools such as Terraform or Puppet

Nice To Haves

  • Knowledge and experience with NewRelic and/or other AIOps platforms
  • Have programming skills – Javascript, Ruby and/or Go
  • Experience with Nginx, HAProxy, Docker, Kubernetes or similar technologies
  • Experience with messaging systems, collaboration software, application-based firewall and proxy server(s), and operating systems
  • Experience with Linux and Windows operating systems, along with scripting tools and techniques such as Bash, CSH, KSH, ZSH, etc. and/or Powershell.
  • Experience with Monitoring and Alerting tools such as Prometheus, Grafana and Datadog

Responsibilities

  • working with program development teams, infrastructure and platform services teams, and traditional operations and maintenance teams to embrace and embody a shared responsibility for the reliability of an organizations’ applications and infrastructure.
  • combine aspects of software engineering with traditional operations to maintain and improve the reliability, availability, and performance of cloud, infrastructure, and large-scale software systems and services while minimizing downtime and mitigating potential failures.
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service