IT Systems Administrator

Positron CorporationSpokane, WA
3dOnsite

About The Position

We’re hiring an IT Systems Administrator to own the on-prem environment that powers AI inference development at Positron.ai. You’ll keep development & cloud accessible inference systems and an on-prem compute cluster reliable, secure, and observable; support remote access (via VPN) for distributed teammates; and be the hands-on owner of server room operations, storage, networking, virtualization, provisioning, and monitoring. This is a high-impact IC role with broad scope across hardware, software, and documentation.

Requirements

  • 5+ years administering Linux systems in a mixed on-prem environment (servers, switches/firewalls, NAS, SAN). Strong in Ethernet/IP, VLANs, firewalls/VPNs, DNS/DHCP/NTP; confident with Ansible, PXE, Bash, and Git
  • Hands-on with NFS/NAS, snapshots/replication, and backup/restore drills
  • Experience with virtualization (Proxmox/KVM/ESXi), VM templating, and host lifecycle management
  • Monitoring/alerting with Prometheus/Grafana (or equivalent), plus log collection and dashboarding
  • Clear documentation habits; steady incident responder with on-call experience

Nice To Haves

  • Tailscale administration; IPsec tunnels; Proxmox clustering and Ceph; L2/L3 switch config (e.g., VLAN trunks, LACP); Terraform; secrets management; hardware automation (Redfish/IPMI)
  • Familiarity with SLURM or job schedulers; GPU server care and feeding; basic Python for ops tooling

Responsibilities

  • Server room operations: Rack/unrack servers and network gear; manage cabling; configure PDUs; maintain accurate inventories and diagrams
  • Storage & backups: Operate and harden NAS; manage NFS exports/mounts; implement/test backup/restore; enforce access controls
  • Networking: Configure/maintain switches, routers, APs, and firewalls; manage VLANs, VPNs (incl. IPsec), DNS/DHCP/IPAM; monitor performance and security; troubleshoot connectivity; manage primary/backup ISPs; support Tailscale access
  • Provisioning & config management: Maintain PXE/kickstart/UEFI workflows; automate OS/app configuration with Ansible; keep golden images and templates current
  • Cluster & job infrastructure: Monitor cluster utilization and job health; troubleshoot failures/perf issues; plan/execute software and hardware upgrades
  • Virtualization: Administer Proxmox (or similar); create/manage VMs and templates; monitor host/guest performance; triage virtualization issues
  • Observability & incident response: Operate Prometheus/Grafana (and related exporters/alerts); create actionable alerts; analyze trends; run incident comms and postmortems; schedule and report maintenance windows
  • Documentation & process: Maintain runbooks, SOPs, topology maps, and asset records (make/model/SN/tags/location/usage); champion repeatable, auditable operations
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service