System Administrator, Data Centers

UtilidataAnn Arbor, MI
$120,000 - $150,000Hybrid

About The Position

Utilidata is a fast-growing NVIDIA-backed edge AI company enabling greater visibility and control of power utilization in energy-intensive infrastructure, like the electric grid and data centers. Karman, the company’s distributed AI platform powered by a custom NVIDIA module, is transforming the way utility companies operate the grid edge and will enable data centers to unlock more compute for the same provisioned power. The Systems Administrator is responsible for the day-to-day operational support, maintenance, and reliability of Karman systems deployed in high-density data center environments. This role focuses on Linux systems administration, system health, uptime, and operational excellence. This is not an architecture or design role, but a hands-on position ensuring stable, secure, and optimized production systems. This position is based onsite at our company headquarters in Ann Arbor, Michigan, with flexibility for occasional remote work. Candidates will be expected to collaborate cross-functionally with remote teams based across the country.

Requirements

  • 5+ years of hands-on Linux systems administration experience in production environments
  • Strong experience with Linux installation, configuration management, patching, and performance tuning
  • Experience supporting systems in data center or high-availability environments
  • Solid understanding of system monitoring, log management, and troubleshooting methodologies
  • Experience working within established infrastructure standards and operational processes
  • Comfortable working in a fast-paced, operationally focused environment

Nice To Haves

  • Experience with configuration management or automation tools (Ansible, Puppet, Chef, or similar)
  • Experience with monitoring/observability tools (Prometheus, Grafana, ELK stack, or similar)
  • Familiarity with containerized environments (Docker, Kubernetes)
  • Exposure to high-performance computing, AI infrastructure, or GPU-based systems
  • Familiarity with security frameworks such as NIST, CIS benchmarks, or SOC 2 controls.
  • Experience with SIEM platforms (Splunk, Sentinel, or similar)

Responsibilities

  • Perform day-to-day Linux systems administration across production environments (RHEL, Ubuntu, CentOS or similar)
  • Install, configure, patch, upgrade, and maintain Linux servers and associated infrastructure
  • Monitor system performance, availability, and resource utilization; proactively address issues before impact
  • Troubleshoot and resolve operating system, application, and infrastructure-related incidents
  • Execute established deployment procedures and configuration standards for Karman systems
  • Support rack-based systems, including hardware checks and coordination around PDUs and PSUs as needed
  • Maintain system security through patching, access control management, and adherence to best practices
  • Perform log analysis, root cause analysis, and incident documentation
  • Contribute to documentation of operational runbooks, troubleshooting guides, and system procedures
  • Support Tailscale network access, user accounts, and permission policies
  • Support observability stack (Prometheus/Grafana) including dashboards, user access management, metrics collection, and alerting
  • Provide internal technical support to engineering teams, including general escalations and package install requests
  • Work with IT and security teams on vulnerability scanning, remediation timelines, and patch prioritization
  • Support change management processes, coordinating with IT on network, firewall, and access control changes affecting production environments
  • Participate in security reviews, audits, and tabletop exercises as needed
  • Coordinate with IT on asset lifecycle management, including provisioning, decommissioning, and inventory tracking for data center hardware
  • Coordinate with the SOC on security incident detection, triage, and response for Karman production systems
  • Participate in on-call rotation to support system uptime and rapid incident response

Benefits

  • A flexible work environment with flexible paid time off
  • Competitive compensation and benefits, including health, dental, vision, and employer-match 401k

Stand Out From the Crowd

Upload your resume and get instant feedback on how well it matches this job.

Upload and Match Resume

What This Job Offers

Job Type

Full-time

Career Level

Mid Level

Education Level

No Education Listed

Number of Employees

1-10 employees

© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service