Site Reliability Engineer, Senior Advisor

PeratonAnnapolis Junction, MD
1d

About The Position

Join the Peraton Team as a Site Reliability Engineer (SRE3) and Help Secure Mission-Critical Systems! We are seeking a highly experienced Site Reliability Engineer (SRE) to support large-scale, highly distributed systems in a mission-critical environment. This role requires a strong blend of software development and system administration expertise, with a focus on designing and implementing sustainable automation solutions that improve reliability, efficiency, and operational consistency. The ideal candidate will leverage extensive experience managing large systems to develop tools that: Reduce risk to production environments Minimize human error Eliminate labor-intensive and repetitive manual processes Improve adherence to operational procedures Serve as a force multiplier for monitoring and system administration teams Automation solutions may include configuration management tools (e.g., SALT, Puppet), custom-developed GUIs for shift operations, or fully automated cluster-level solutions. The goal is to deliver sustainable tools that perform at or above the reliability of manual processes. Peraton offers enhanced benefits to employees supporting our critical National Security programs, including: Heavily subsidized medical, dental, and vision coverage for employees and their dependents Eligibility to participate in a competitive bonus plan Generous PTO plan #MPOJobs #AJCM #PeratonRoyalMove

Requirements

  • Bachelor’s Degree with 12+ years relevant experience
  • Master’s Degree with 10+ years relevant experience
  • PhD with 7+ years relevant experience
  • Active TS/SCI with current polygraph
  • AWS Developer – Associate | AWS Solutions Architect (Associate or Professional) | AWS SysOps Administrator – Associate | CKA/CKAD | Elastic Certified Engineer | Elastic Certified Observability Engineer
  • 7+ years software development/engineering experience including requirements analysis, development, integration, installation, testing, maintenance, and issue resolution
  • 7+ years system engineering/architecture in large-scale environments
  • 7+ years supporting distributed/parallel systems (e.g., HBase, Hadoop, Accumulo, Big Table, Cassandra, Scality)
  • 7+ years scripting/automation using Python, Perl, or Ruby
  • 4+ years managing and monitoring cloud-based systems
  • Experience in system integration, health monitoring, incident management, and postmortem analysis
  • Cloud certification will be verified during the interview or offer process.

Responsibilities

  • Design and implement automation solutions for large-scale distributed systems
  • Develop software tools to support monitoring and system administration teams
  • Provide technical direction for development, integration, and testing of hardware/software systems
  • Manage and monitor large cloud-based environments
  • Conduct postmortem analysis and support incident management processes
  • Improve operational processes and system health visibility
  • Support distributed, massively parallel data environments

Benefits

  • Heavily subsidized medical, dental, and vision coverage for employees and their dependents
  • Eligibility to participate in a competitive bonus plan
  • Generous PTO plan
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service