Associate Principal, Storage Engineering

OCC•Chicago, IL

1d•Hybrid

About The Position

What You'll Do: The Lead Associate Principal, Linux Server Administration is responsible for overseeing the design, implementation, and optimization of enterprise-wide Linux server infrastructure with a focus on automation and containerization platforms across on-premises and cloud environments. This role provides technical leadership and strategic direction for Linux systems architecture, Ansible Automation Platform, Red Hat Satellite, OpenShift, and AWS cloud infrastructure while mentoring team members and ensuring high availability, security, and performance across all Linux systems. The position serves as the primary technical authority for complex Linux server challenges and drives innovation in infrastructure automation, cloud-native development, hybrid cloud integration, and enterprise disaster recovery solutions. Primary Duties and Responsibilities: To perform this job successfully, an individual must be able to perform each primary duty satisfactorily.

Requirements

10+ years of progressive hands-on experience in Linux/Unix system administration
5+ years in a technical leadership or senior engineering role
Strong hands-on experience with Ansible Automation Platform (AAP) including automation controller, execution environments, and workflow development
Proven expertise in Red Hat Satellite for system lifecycle management and content management
Extensive experience planning and executing enterprise-scale Linux patching programs including change management, patch testing, and emergency patching procedures
Demonstrated experience designing and implementing disaster recovery solutions for Linux infrastructure including backup/restore, replication, and failover strategies
Demonstrated experience planning and executing RHEL OS upgrades across major versions (e.g., RHEL 7 to 8, RHEL 8 to 9) using Leapp and other upgrade methodologies
Extensive hands-on experience with AWS Linux EC2 instances, including Amazon Linux and RHEL on AWS
Demonstrated experience in AMI creation, customization, hardening, and lifecycle management
Proven track record of building automated AMI pipelines using tools such as Packer, Ansible, or AWS Image Builder
Demonstrated experience with AWS cloud services and hybrid cloud architectures
Extensive hands-on experience with OpenShift container platform and Kubernetes orchestration
Demonstrated experience implementing and managing NFS and distributed storage solutions
Working knowledge of Red Hat Dev Spaces for development environment provisioning
Proven track record of designing and implementing large-scale automated Linux infrastructure in hybrid environments
Strong understanding of DevOps principles and CI/CD methodologies
Excellent problem-solving abilities and analytical thinking skills
Outstanding communication skills with ability to explain technical concepts to non-technical stakeholders
Strong project management capabilities and ability to manage multiple priorities
Advanced hands-on proficiency in Red Hat Enterprise Linux administration and troubleshooting
Extensive experience with Linux patching and patch management including: Enterprise-scale patch deployment using Red Hat Satellite and Ansible Patch testing and validation in non-production environments Emergency and zero-day vulnerability patching procedures Kernel patching strategies including live patching (kpatch) Patch rollback and recovery procedures Compliance reporting and audit trail maintenance Patch scheduling and maintenance window coordination AWS Systems Manager Patch Manager for cloud-based patching
Expert-level disaster recovery and business continuity experience including: Backup and restore strategies (Bacula, Veeam, AWS Backup, snapshots) Replication technologies (rsync, DRBD, storage-level replication) Multi-site and multi-region DR architectures RPO/RTO analysis and optimization Failover and failback automation DR testing and validation procedures Disaster recovery documentation and runbooks Cloud-based DR solutions (AWS disaster recovery services)
Extensive experience with RHEL OS upgrade processes including: In-place upgrades using Leapp utility (RHEL 7→8, RHEL 8→9) Pre-upgrade assessment and compatibility testing Application compatibility validation and remediation Upgrade automation using Ansible and Satellite Rollback and disaster recovery planning for upgrade failures Post-upgrade validation and system optimization Managing kernel and package dependencies during upgrades
Expert-level experience with Ansible Automation Platform (AAP) including playbook development, roles, collections, automation controller, and execution environments
Strong expertise in Red Hat Satellite for provisioning, patch management, configuration management, and content views
Extensive hands-on experience with AWS Linux EC2 including instance management, auto-scaling groups, launch templates, and Amazon Linux/RHEL optimization
Expert-level experience with AMI creation and management including: Building custom AMIs using Packer, AWS Image Builder, or manual processes AMI hardening and security baseline implementation Automated AMI patching and update workflows AMI versioning, tagging, and lifecycle policies Golden image development and maintenance Cross-account and cross-region AMI sharing and distribution AMI testing and validation procedures
Proficiency with AWS services including VPC, security groups, IAM, EBS, EFS, S3, CloudWatch, Systems Manager, AWS Backup, and AWS CLI
Extensive hands-on experience with OpenShift/Kubernetes including deployment, cluster management, CI/CD pipelines, and troubleshooting
Proficiency with NFS configuration, performance tuning, and high-availability implementations
Experience with Red Hat Dev Spaces for containerized development environments
Expert-level scripting capabilities (Bash, Python, Perl)
Experience with infrastructure as code tools (Terraform, CloudFormation, AWS CDK, Packer)
Experience with additional configuration management tools (Puppet, Chef, SaltStack)
Deep knowledge of containerization technologies (Docker, Podman)
Strong networking knowledge (TCP/IP, DNS, DHCP, routing, firewalls, AWS networking)
Proficiency with monitoring and logging solutions (Logic Monitor, Splunk, Nagios, Prometheus, Grafana, ELK Stack, CloudWatch)
Experience with storage technologies (SAN, LVM, GlusterFS, Ceph, EBS, EFS, filesystems)
Experience with backup solutions (Bacula, Veeam, CommVault, AWS Backup, snapshots)
Knowledge of web services (Apache, Nginx), databases (MySQL, PostgreSQL, MongoDB, RDS)
Familiarity with version control systems (Git, GitLab, GitHub, CodeCommit)
Understanding security frameworks and compliance standards (PCI-DSS, HIPAA, SOC 2, AWS Well-Architected Framework, CIS benchmarks)
Experience with GitOps practices and CI/CD pipelines
Knowledge of virtualization technologies (VMware, KVM, Xen)
Experience with high availability clustering (Pacemaker, Corosync)
Knowledge of package management (RPM, YUM, DNF) and repository management
Experience with API integrations and automation workflows
Bachelor’s degree in computer science, Information Technology, or related field
10+ years of relevant hands-on Linux system administration experience required
3+ years of hands-on experience with Ansible Automation Platform required
3+ years of experience with Red Hat Satellite and OpenShift preferred
3+ years of hands-on experience with AWS Linux EC2 and cloud infrastructure required
2+ years of hands-on experience with AMI creation, management, and automation required
Demonstrated experience executing RHEL OS upgrades across multiple major versions required
Extensive experience with enterprise Linux patching programs and disaster recovery planning required
Proven experience implementing CIS benchmarks and security hardening across Linux environments required
Working experience with ITSM tools (ServiceNow, JIRA, Confluence) for ticket management and documentation required

Nice To Haves

Master's degree in related field or equivalent combination of education and experience
Red Hat certifications (RHCE, RHCA, Red Hat Certified Specialist in Ansible Automation, or OpenShift certifications) strongly preferred
AWS certifications (AWS Certified Solutions Architect, AWS Certified SysOps Administrator, or AWS Certified DevOps Engineer) strongly preferred
Security certifications (Security+, CISSP, or CIS certification) a plus
ITIL Foundation certification a plus

Responsibilities

Lead the design, deployment, and maintenance of enterprise Linux server environments (RHEL, CentOS, Ubuntu, SUSE, Amazon Linux) with hands-on configuration and troubleshooting across on-premises and AWS cloud infrastructure
Plan, execute, and manage enterprise-wide Linux patching strategies including security patches, kernel updates, and critical vulnerability remediation across thousands of servers
Develop and maintain comprehensive disaster recovery (DR) plans for Linux infrastructure including RPO/RTO targets, failover procedures, and recovery testing schedules
Implement and enforce CIS (Center for Internet Security) benchmarks and security baselines across all Linux systems including automated compliance scanning, remediation, and reporting
Plan, execute, and manage RHEL operating system upgrades across enterprise environments including in-place upgrades (Leapp), migration strategies, and rollback procedures
Develop and implement infrastructure automation strategies using Ansible Automation Platform (AAP) including playbook development, workflow orchestration, and automation controller management
Manage and optimize Red Hat Satellite infrastructure for system provisioning, patch management, and content lifecycle management across the enterprise
Implement and manage automated patching workflows using Red Hat Satellite, Ansible, and AWS Systems Manager for both on-premises and cloud environments
Design, deploy, and manage AWS Linux EC2 instances including instance configuration, auto-scaling, and integration with AWS services
Create, maintain, and manage AMI (Amazon Machine Image) lifecycle including image hardening, patching, golden image development, and automated AMI pipeline creation
Implement AMI versioning strategies, testing procedures, and distribution processes across multiple AWS accounts and regions
Design and implement disaster recovery solutions including backup strategies, replication technologies, failover automation, and multi-region/multi-site architectures
Design and maintain NFS storage solutions and distributed file systems for enterprise applications
Architect, deploy, and manage OpenShift container platforms and Kubernetes environments in hybrid cloud configurations
Implement and support Red Hat Dev Spaces for cloud-native development workflows
Conduct regular DR drills and testing to validate backup and recovery procedures
Develop and maintain security hardening standards based on CIS benchmarks, STIG requirements, and organizational security policies
Manage incidents, requests, and change management processes using ITSM tools such as ServiceNow including ticket resolution, escalations, and SLA compliance
Maintain technical documentation, knowledge base articles, runbooks, and operational procedures in Confluence
Establish and enforce Linux server security standards, hardening procedures, and compliance protocols across on-premises and cloud environments
Oversee system performance monitoring, capacity planning, and optimization initiatives across all platforms
Provide escalation support for complex technical issues and lead incident response efforts
Collaborate with cross-functional teams including networking, storage, security, and application development
Drive continuous improvement initiatives and evaluate emerging Red Hat, AWS, and cloud-native technologies
Create and maintain comprehensive technical documentation, runbooks, and standard operating procedures
Participate in on-call rotation and provide 24/7 support for critical systems as needed
Lead vendor management activities and coordinate with Red Hat and AWS support
Provide technical mentorship and guidance to Linux administrators and junior team members
Lead technical training sessions and knowledge transfer initiatives on Ansible, Satellite, OpenShift, AWS, patching, and DR procedures

Benefits

A highly collaborative and supportive environment developed to encourage work-life balance and employee wellness.
A hybrid work environment, up to 2 days per week of remote work
Tuition Reimbursement to support your continued education
Student Loan Repayment Assistance
Technology Stipend allowing you to use the device of your choice to connect to our network while working remotely
Generous PTO and Parental leave
401k Employer Match
Competitive health benefits including medical, dental and vision

Stand Out From the Crowd

Upload your resume and get instant feedback on how well it matches this job.

Upload and Match Resume