Senior Linux Infrastructure Engineer

Northern Light•Somerville, MA

3d•Onsite

About The Position

Northern Light is seeking an experienced Senior Linux Infrastructure Engineer to take hands-on ownership of our Linux-based infrastructure. In this role, you’ll be responsible for operating, maintaining, and evolving a mission-critical environment that runs on a mix of bare-metal and virtualized systems in our private colocation datacenter. This is a highly technical, hands-on position for someone who enjoys working close to the hardware, values operational excellence, and thrives in environments where reliability, security, and scalability matter.

Requirements

7+ years of Linux systems engineering experience, including at least 3 years as the primary owner of infrastructure at scale
RHEL-based distributions required; Oracle Linux 9 strongly preferred
BS or MS in Computer Science, Computer Engineering, Information Technology, or equivalent practical experience
Demonstrated ability to manage the full server lifecycle end-to-end: OS installation and hardening, configuration management, patching cadence, capacity planning, and decommissioning
Deep proficiency with Linux system internals: kernel tuning, systemd service management, storage subsystems (LVM, RAID, NFS/NAS), and network stack configuration (bonding, VLANs, firewall rules via firewalld/iptables)
Proven experience designing and maintaining high-availability and redundant architectures in production
Direct, hands-on experience with HPE ProLiant (or equivalent enterprise) servers, including iLO/IPMI management, firmware updates, hardware diagnostics, and component replacement
Experience performing all physical datacenter tasks independently: rack and stack, structured cabling, labeling, power management, and hardware lifecycle tracking in a private colo or equivalent environment
Production-level administration of VMware vSphere/ESXi environments (vCenter, VM lifecycle, resource pools, snapshots, HA/DRS) and KVM-based virtualization (libvirt, virt-manager, bridged/NAT networking)
Hands-on experience building and maintaining Ansible playbooks and roles for configuration management, OS hardening, and orchestrated change deployment
Demonstrated experience administering production DNS (BIND), LDAP-backed authentication, mail relay, and SFTP services
Experience managing on-prem package repository infrastructure (Foreman or equivalent) and enforcing controlled patching workflows across a large server fleet
Hands-on experience with enterprise monitoring and alerting platforms (LogicMonitor, Zabbix, Nagios, or equivalent), including building meaningful alerting thresholds and dashboards
Active experience with vulnerability management tools (Tenable, Qualys, or equivalent): scan interpretation, remediation prioritization, and patch compliance tracking
Experience owning backup and disaster recovery operations end-to-end: policy design, Veeam (or equivalent) administration, recovery testing, and DR runbook maintenance
Track record of leading incident response, conducting post-mortems, and producing root-cause analyses with lasting corrective actions
Strong technical documentation discipline (SOPs, runbooks, infrastructure diagrams) and ability to communicate clearly to both technical and non-technical stakeholders
Solid networking fundamentals with hands-on experience collaborating with network teams on switches, firewalls, and load balancers

Nice To Haves

Potential future migration to Red Hat OVE
Experience supporting platform migrations (e.g., VMware to open-source hypervisors)
Familiarity with Docker and Kubernetes

Responsibilities

Operate and maintain a Linux infrastructure of ~70 HPE DL360 Gen9/Gen10 bare-metal servers and ~150 virtual machines
Administer RHEL-based systems (primarily Oracle Linux 9), including installation, patching, upgrades, and security hardening
Support virtualization platforms, including: VMware (5 nodes, ~100 VMs), with potential future migration to Red Hat OVE; KVM-based virtualization supporting Kubernetes workloads (~50 VMs)
Perform on-site datacenter operations in a private six-rack cage, including racking, cabling, labeling, hardware replacement, and decommissioning
Maintain and administer core infrastructure services such as: Mail relay (Sendmail), DNS (BIND), SFTP (ProFTPd), LDAP-backed authentication and authorization, Package repository mirroring (Foreman), Centralized automation and orchestration (Ansible Automation Platform), On-prem GitLab Premium for version control and CI/CD
Develop and maintain standard operating procedures covering: Inventory management (NetBox), Monitoring, alerting, and observability (LogicMonitor), Incident response and root-cause analysis, Vulnerability and patch management (Tenable One), Backup, recovery, and disaster recovery (Veeam)
Participate in incident response, scheduled maintenance, and post-incident reviews
Collaborate with vendors, service providers, and internal engineering, security, and operations teams
Support secure, compliant infrastructure that enables product delivery and business needs