Linux OS Engineer

STEM Solutions•Annapolis Junction, MD

About The Position

The Linux OS Engineer is expected to be able to accomplish the following: Platform Engineering & Operations Engineer, deploy, and administer RHEL/CentOS/Rocky and Ubuntu servers (physical and virtual) across dev/test/prod. Implement standard OS baselines, golden images, and immutable patterns (e.g., image pipelines, templates, kickstart/preseed, cloud-init). Manage patching and kernel updates with maintenance windows and rollback plans; validate health post-change. Configuration Management & Automation Build and maintain automation using Ansible, Bash, and Python to enforce configuration drift control and speed delivery. Integrate configuration policies with CI/CD workflows and version control; write idempotent playbooks/roles. Security & Compliance Apply and maintain DoD STIG/hardening controls; remediate findings from vulnerability scans; maintain audit artifacts. Implement/validate FIPS, SELinux, disk encryption, log forwarding, and secure remote access; support ATO evidence. Reliability, Monitoring & Performance Design and operate logging/metrics/alerting (e.g., journald/rsyslog, auditd, Prometheus/Grafana/ELK) with actionable SLOs. Troubleshoot OS, storage, and network performance (I/O, CPU, memory, TCP/IP) using system profiling tools. High Availability & Continuity Support clustering and HA patterns (Pacemaker/Corosync, systemd units, load balancers); document and test DR/backup restore. Collaboration & Support Provide Tier 3 escalation; create runbooks/SOPs; participate in on-call rotations as needed. Partner closely with cybersecurity, network, storage, virtualization, and application owners to deliver end-to-end outcomes.

Requirements

Active TS/SCI + CI Poly clearance
5–7 years of professional experience administering Linux in enterprise or government environments (or equivalent combination of education/experience suitable for mid level).
One of the following DoD 8570/CS-NET-451 certifications: Security+ (CE), RHCSA/RHCE, CKA, ITIL® Foundation
Strong proficiency with RHEL-family and Debian/Ubuntu administration, system services, and package managers (dnf/yum/apt).
Hands-on with Ansible and scripting (Bash, Python) for automation and compliance.
Demonstrated experience applying STIG/hardening and working within secure change-controlled environments.
Solid troubleshooting across compute, network, and storage layers; familiarity with virtualization (VMware/KVM) and ticketing/ITSM workflows.
Ability to create and maintain SOPs, runbooks, and architecture diagrams supporting audits and operations.

Nice To Haves

Experience with RHEL Satellite/Foreman, Kickstart, IdM/FreeIPA/AD integration, and secrets management (e.g., HashiCorp Vault).
Exposure to container runtimes (Podman/Docker), Kubernetes (RKE/AKS/Openshift), and hardened container images.
Infrastructure-as-Code (Terraform), cloud (AWS/Azure/Gov zones), and registry/artifact management.
Monitoring stacks (Prometheus/Grafana, ELK/OpenSearch) and SIEM integrations.
SSCP
CySA+
GSEC
CASP+
CISSP
Red Hat Ansible Automation certification
Terraform Associate
Kubernetes certification

Responsibilities

Deploy and administer RHEL/CentOS/Rocky and Ubuntu servers (physical and virtual) across dev/test/prod.
Implement standard OS baselines, golden images, and immutable patterns (e.g., image pipelines, templates, kickstart/preseed, cloud-init).
Manage patching and kernel updates with maintenance windows and rollback plans; validate health post-change.
Build and maintain automation using Ansible, Bash, and Python to enforce configuration drift control and speed delivery.
Integrate configuration policies with CI/CD workflows and version control; write idempotent playbooks/roles.
Apply and maintain DoD STIG/hardening controls; remediate findings from vulnerability scans; maintain audit artifacts.
Implement/validate FIPS, SELinux, disk encryption, log forwarding, and secure remote access; support ATO evidence.
Design and operate logging/metrics/alerting (e.g., journald/rsyslog, auditd, Prometheus/Grafana/ELK) with actionable SLOs.
Troubleshoot OS, storage, and network performance (I/O, CPU, memory, TCP/IP) using system profiling tools.
Support clustering and HA patterns (Pacemaker/Corosync, systemd units, load balancers); document and test DR/backup restore.
Provide Tier 3 escalation; create runbooks/SOPs; participate in on-call rotations as needed.
Partner closely with cybersecurity, network, storage, virtualization, and application owners to deliver end-to-end outcomes.