IT Systems Administrator

Positron Corporation•Spokane, WA

34d•Onsite

About The Position

We’re hiring an IT Systems Administrator to own the on-prem environment that powers AI inference development at Positron.ai. You’ll keep development & cloud accessible inference systems and an on-prem compute cluster reliable, secure, and observable; support remote access (via VPN) for distributed teammates; and be the hands-on owner of server room operations, storage, networking, virtualization, provisioning, and monitoring. This is a high-impact IC role with broad scope across hardware, software, and documentation.

Requirements

5+ years administering Linux systems in a mixed on-prem environment (servers, switches/firewalls, NAS, SAN). Strong in Ethernet/IP, VLANs, firewalls/VPNs, DNS/DHCP/NTP; confident with Ansible, PXE, Bash, and Git
Hands-on with NFS/NAS, snapshots/replication, and backup/restore drills
Experience with virtualization (Proxmox/KVM/ESXi), VM templating, and host lifecycle management
Monitoring/alerting with Prometheus/Grafana (or equivalent), plus log collection and dashboarding
Clear documentation habits; steady incident responder with on-call experience

Nice To Haves

Tailscale administration; IPsec tunnels; Proxmox clustering and Ceph; L2/L3 switch config (e.g., VLAN trunks, LACP); Terraform; secrets management; hardware automation (Redfish/IPMI)
Familiarity with SLURM or job schedulers; GPU server care and feeding; basic Python for ops tooling

Responsibilities

Server room operations: Rack/unrack servers and network gear; manage cabling; configure PDUs; maintain accurate inventories and diagrams
Storage & backups: Operate and harden NAS; manage NFS exports/mounts; implement/test backup/restore; enforce access controls
Networking: Configure/maintain switches, routers, APs, and firewalls; manage VLANs, VPNs (incl. IPsec), DNS/DHCP/IPAM; monitor performance and security; troubleshoot connectivity; manage primary/backup ISPs; support Tailscale access
Provisioning & config management: Maintain PXE/kickstart/UEFI workflows; automate OS/app configuration with Ansible; keep golden images and templates current
Cluster & job infrastructure: Monitor cluster utilization and job health; troubleshoot failures/perf issues; plan/execute software and hardware upgrades
Virtualization: Administer Proxmox (or similar); create/manage VMs and templates; monitor host/guest performance; triage virtualization issues
Observability & incident response: Operate Prometheus/Grafana (and related exporters/alerts); create actionable alerts; analyze trends; run incident comms and postmortems; schedule and report maintenance windows
Documentation & process: Maintain runbooks, SOPs, topology maps, and asset records (make/model/SN/tags/location/usage); champion repeatable, auditable operations

Stand Out From the Crowd

Upload your resume and get instant feedback on how well it matches this job.

Upload and Match Resume