Linux Systems Administrator - Cloud & Distributed Systems, Lead Associate

Peraton•Laurel, MD

5d•Onsite

About The Position

We are seeking a highly skilled Senior Cloud & Distributed Systems Engineer to support mission-critical, cloud-based data repositories serving users within a dynamic Intelligence Community environment. This role operates in a dynamic, high-tempo operational environment where requirements evolve rapidly in response to global events. The selected candidate will administer and engineer large-scale Hadoop and Accumulo clusters, ensure system reliability and security, and collaborate across infrastructure, networking, and security domains to maintain continuous mission availability. This is not a traditional system administration role — it is a reliability-focused distributed systems engineering position operating on a scale. This position will provide after-hours on call/call in support.

Requirements

7+ years of Linux systems administration experience
5+ years scripting (Bash, Python, or Perl)
5 years with BS/BA or 3 years with MS/MA
Experience supporting large-scale distributed systems across multiple clusters
Hands-on experience with Hadoop, Accumulo, and distributed storage technologies
Experience with Kubernetes and Docker
Familiarity with automation tools (Puppet, Ansible, Salt) and IaC (Terraform or CloudFormation)
Knowledge of monitoring platforms (Prometheus, Grafana, ELK, Splunk)
Understanding of networking fundamentals (VLANs, TCP/IP, load balancing)
Experience with high-availability design, storage architecture, and disaster recovery
System hardening and security compliance experience
Active TS/SCI clearance with current polygraph
Security+ (DoD 8570 compliant)
One of the following: AWS Certified SysOps Administrator – Associate, AWS DevOps Engineer – Professional, Certified Kubernetes Administrator (CKA)

Responsibilities

Administer and optimize enterprise Linux systems across large distributed clusters (60+ nodes, multi-rack deployments)
Monitor system health, performance, and reliability; troubleshoot complex hardware, software, network, and cloud issues
Support and tune Hadoop (HDFS/YARN) and Accumulo environments
Engineer monitoring, observability, and alerting solutions
Automate infrastructure using scripting and Infrastructure-as-Code
Patch, harden, and secure systems in compliance with security standards
Manage LDAP-based user accounts and core Linux services (DHCP, DNS)
Support Kubernetes-based containerized workloads
Participate in architecture discussions and cross-team engineering efforts
Contribute to incident response and root cause analysis