Associate Principal, Storage Engineering

OCCChicago, IL
1dHybrid

About The Position

What You'll Do: The Lead Associate Principal, Linux Server Administration is responsible for overseeing the design, implementation, and optimization of enterprise-wide Linux server infrastructure with a focus on automation and containerization platforms across on-premises and cloud environments. This role provides technical leadership and strategic direction for Linux systems architecture, Ansible Automation Platform, Red Hat Satellite, OpenShift, and AWS cloud infrastructure while mentoring team members and ensuring high availability, security, and performance across all Linux systems. The position serves as the primary technical authority for complex Linux server challenges and drives innovation in infrastructure automation, cloud-native development, hybrid cloud integration, and enterprise disaster recovery solutions. Primary Duties and Responsibilities: To perform this job successfully, an individual must be able to perform each primary duty satisfactorily.

Requirements

  • 10+ years of progressive hands-on experience in Linux/Unix system administration
  • 5+ years in a technical leadership or senior engineering role
  • Strong hands-on experience with Ansible Automation Platform (AAP) including automation controller, execution environments, and workflow development
  • Proven expertise in Red Hat Satellite for system lifecycle management and content management
  • Extensive experience planning and executing enterprise-scale Linux patching programs including change management, patch testing, and emergency patching procedures
  • Demonstrated experience designing and implementing disaster recovery solutions for Linux infrastructure including backup/restore, replication, and failover strategies
  • Demonstrated experience planning and executing RHEL OS upgrades across major versions (e.g., RHEL 7 to 8, RHEL 8 to 9) using Leapp and other upgrade methodologies
  • Extensive hands-on experience with AWS Linux EC2 instances, including Amazon Linux and RHEL on AWS
  • Demonstrated experience in AMI creation, customization, hardening, and lifecycle management
  • Proven track record of building automated AMI pipelines using tools such as Packer, Ansible, or AWS Image Builder
  • Demonstrated experience with AWS cloud services and hybrid cloud architectures
  • Extensive hands-on experience with OpenShift container platform and Kubernetes orchestration
  • Demonstrated experience implementing and managing NFS and distributed storage solutions
  • Working knowledge of Red Hat Dev Spaces for development environment provisioning
  • Proven track record of designing and implementing large-scale automated Linux infrastructure in hybrid environments
  • Strong understanding of DevOps principles and CI/CD methodologies
  • Excellent problem-solving abilities and analytical thinking skills
  • Outstanding communication skills with ability to explain technical concepts to non-technical stakeholders
  • Strong project management capabilities and ability to manage multiple priorities
  • Advanced hands-on proficiency in Red Hat Enterprise Linux administration and troubleshooting
  • Extensive experience with Linux patching and patch management including: Enterprise-scale patch deployment using Red Hat Satellite and Ansible Patch testing and validation in non-production environments Emergency and zero-day vulnerability patching procedures Kernel patching strategies including live patching (kpatch) Patch rollback and recovery procedures Compliance reporting and audit trail maintenance Patch scheduling and maintenance window coordination AWS Systems Manager Patch Manager for cloud-based patching
  • Expert-level disaster recovery and business continuity experience including: Backup and restore strategies (Bacula, Veeam, AWS Backup, snapshots) Replication technologies (rsync, DRBD, storage-level replication) Multi-site and multi-region DR architectures RPO/RTO analysis and optimization Failover and failback automation DR testing and validation procedures Disaster recovery documentation and runbooks Cloud-based DR solutions (AWS disaster recovery services)
  • Extensive experience with RHEL OS upgrade processes including: In-place upgrades using Leapp utility (RHEL 7→8, RHEL 8→9) Pre-upgrade assessment and compatibility testing Application compatibility validation and remediation Upgrade automation using Ansible and Satellite Rollback and disaster recovery planning for upgrade failures Post-upgrade validation and system optimization Managing kernel and package dependencies during upgrades
  • Expert-level experience with Ansible Automation Platform (AAP) including playbook development, roles, collections, automation controller, and execution environments
  • Strong expertise in Red Hat Satellite for provisioning, patch management, configuration management, and content views
  • Extensive hands-on experience with AWS Linux EC2 including instance management, auto-scaling groups, launch templates, and Amazon Linux/RHEL optimization
  • Expert-level experience with AMI creation and management including: Building custom AMIs using Packer, AWS Image Builder, or manual processes AMI hardening and security baseline implementation Automated AMI patching and update workflows AMI versioning, tagging, and lifecycle policies Golden image development and maintenance Cross-account and cross-region AMI sharing and distribution AMI testing and validation procedures
  • Proficiency with AWS services including VPC, security groups, IAM, EBS, EFS, S3, CloudWatch, Systems Manager, AWS Backup, and AWS CLI
  • Extensive hands-on experience with OpenShift/Kubernetes including deployment, cluster management, CI/CD pipelines, and troubleshooting
  • Proficiency with NFS configuration, performance tuning, and high-availability implementations
  • Experience with Red Hat Dev Spaces for containerized development environments
  • Expert-level scripting capabilities (Bash, Python, Perl)
  • Experience with infrastructure as code tools (Terraform, CloudFormation, AWS CDK, Packer)
  • Experience with additional configuration management tools (Puppet, Chef, SaltStack)
  • Deep knowledge of containerization technologies (Docker, Podman)
  • Strong networking knowledge (TCP/IP, DNS, DHCP, routing, firewalls, AWS networking)
  • Proficiency with monitoring and logging solutions (Logic Monitor, Splunk, Nagios, Prometheus, Grafana, ELK Stack, CloudWatch)
  • Experience with storage technologies (SAN, LVM, GlusterFS, Ceph, EBS, EFS, filesystems)
  • Experience with backup solutions (Bacula, Veeam, CommVault, AWS Backup, snapshots)
  • Knowledge of web services (Apache, Nginx), databases (MySQL, PostgreSQL, MongoDB, RDS)
  • Familiarity with version control systems (Git, GitLab, GitHub, CodeCommit)
  • Understanding security frameworks and compliance standards (PCI-DSS, HIPAA, SOC 2, AWS Well-Architected Framework, CIS benchmarks)
  • Experience with GitOps practices and CI/CD pipelines
  • Knowledge of virtualization technologies (VMware, KVM, Xen)
  • Experience with high availability clustering (Pacemaker, Corosync)
  • Knowledge of package management (RPM, YUM, DNF) and repository management
  • Experience with API integrations and automation workflows
  • Bachelor’s degree in computer science, Information Technology, or related field
  • 10+ years of relevant hands-on Linux system administration experience required
  • 3+ years of hands-on experience with Ansible Automation Platform required
  • 3+ years of experience with Red Hat Satellite and OpenShift preferred
  • 3+ years of hands-on experience with AWS Linux EC2 and cloud infrastructure required
  • 2+ years of hands-on experience with AMI creation, management, and automation required
  • Demonstrated experience executing RHEL OS upgrades across multiple major versions required
  • Extensive experience with enterprise Linux patching programs and disaster recovery planning required
  • Proven experience implementing CIS benchmarks and security hardening across Linux environments required
  • Working experience with ITSM tools (ServiceNow, JIRA, Confluence) for ticket management and documentation required

Nice To Haves

  • Master's degree in related field or equivalent combination of education and experience
  • Red Hat certifications (RHCE, RHCA, Red Hat Certified Specialist in Ansible Automation, or OpenShift certifications) strongly preferred
  • AWS certifications (AWS Certified Solutions Architect, AWS Certified SysOps Administrator, or AWS Certified DevOps Engineer) strongly preferred
  • Security certifications (Security+, CISSP, or CIS certification) a plus
  • ITIL Foundation certification a plus

Responsibilities

  • Lead the design, deployment, and maintenance of enterprise Linux server environments (RHEL, CentOS, Ubuntu, SUSE, Amazon Linux) with hands-on configuration and troubleshooting across on-premises and AWS cloud infrastructure
  • Plan, execute, and manage enterprise-wide Linux patching strategies including security patches, kernel updates, and critical vulnerability remediation across thousands of servers
  • Develop and maintain comprehensive disaster recovery (DR) plans for Linux infrastructure including RPO/RTO targets, failover procedures, and recovery testing schedules
  • Implement and enforce CIS (Center for Internet Security) benchmarks and security baselines across all Linux systems including automated compliance scanning, remediation, and reporting
  • Plan, execute, and manage RHEL operating system upgrades across enterprise environments including in-place upgrades (Leapp), migration strategies, and rollback procedures
  • Develop and implement infrastructure automation strategies using Ansible Automation Platform (AAP) including playbook development, workflow orchestration, and automation controller management
  • Manage and optimize Red Hat Satellite infrastructure for system provisioning, patch management, and content lifecycle management across the enterprise
  • Implement and manage automated patching workflows using Red Hat Satellite, Ansible, and AWS Systems Manager for both on-premises and cloud environments
  • Design, deploy, and manage AWS Linux EC2 instances including instance configuration, auto-scaling, and integration with AWS services
  • Create, maintain, and manage AMI (Amazon Machine Image) lifecycle including image hardening, patching, golden image development, and automated AMI pipeline creation
  • Implement AMI versioning strategies, testing procedures, and distribution processes across multiple AWS accounts and regions
  • Design and implement disaster recovery solutions including backup strategies, replication technologies, failover automation, and multi-region/multi-site architectures
  • Design and maintain NFS storage solutions and distributed file systems for enterprise applications
  • Architect, deploy, and manage OpenShift container platforms and Kubernetes environments in hybrid cloud configurations
  • Implement and support Red Hat Dev Spaces for cloud-native development workflows
  • Conduct regular DR drills and testing to validate backup and recovery procedures
  • Develop and maintain security hardening standards based on CIS benchmarks, STIG requirements, and organizational security policies
  • Manage incidents, requests, and change management processes using ITSM tools such as ServiceNow including ticket resolution, escalations, and SLA compliance
  • Maintain technical documentation, knowledge base articles, runbooks, and operational procedures in Confluence
  • Establish and enforce Linux server security standards, hardening procedures, and compliance protocols across on-premises and cloud environments
  • Oversee system performance monitoring, capacity planning, and optimization initiatives across all platforms
  • Provide escalation support for complex technical issues and lead incident response efforts
  • Collaborate with cross-functional teams including networking, storage, security, and application development
  • Drive continuous improvement initiatives and evaluate emerging Red Hat, AWS, and cloud-native technologies
  • Create and maintain comprehensive technical documentation, runbooks, and standard operating procedures
  • Participate in on-call rotation and provide 24/7 support for critical systems as needed
  • Lead vendor management activities and coordinate with Red Hat and AWS support
  • Provide technical mentorship and guidance to Linux administrators and junior team members
  • Lead technical training sessions and knowledge transfer initiatives on Ansible, Satellite, OpenShift, AWS, patching, and DR procedures

Benefits

  • A highly collaborative and supportive environment developed to encourage work-life balance and employee wellness.
  • A hybrid work environment, up to 2 days per week of remote work
  • Tuition Reimbursement to support your continued education
  • Student Loan Repayment Assistance
  • Technology Stipend allowing you to use the device of your choice to connect to our network while working remotely
  • Generous PTO and Parental leave
  • 401k Employer Match
  • Competitive health benefits including medical, dental and vision
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service