Network Operations Center (NOC) Analyst

Lightning AIFort Worth, TX
Onsite

About The Position

Lightning AI is seeking Network Operations Center (NOC) Analysts to support 24/7 operations across select high-performance compute data centers with advanced monitoring infrastructure. This is a technically focused role centered on telemetry analysis, infrastructure monitoring, and independent diagnosis of compute, network, and hardware systems. You will serve as the first line of technical response — analyzing telemetry signals, diagnosing system anomalies, and troubleshooting Linux and network-layer issues before escalating with clear, actionable findings. You will operate with a high degree of independence, applying sound judgment in situations that often extend beyond predefined runbooks. This role offers a clear pathway toward positions in network engineering, site reliability, or data center operations, and the opportunity work with next-generation AI hardware and some of the most advanced compute infrastructure deployed today. This role is based onsite at one of our data center facilities in Lisle, IL; Fort Worth, TX; or Quincy, WA. Shift flexibility is required to support our 24/7 operations environment. We are not able to provide visa sponsorship for this position at this time.

Requirements

  • Hands-on Linux experience including command-line proficiency and system log analysis
  • Practical understanding of networking concepts: TCP/IP, DNS, routing, and diagnostic tools (ping, traceroute, netstat, tcpdump)
  • Ability to independently diagnose technical issues and exercise sound judgment in ambiguous situations
  • Clear, precise communication skills with strong technical documentation ability
  • Availability to work overnight and rotating shifts in a 24/7 environment

Nice To Haves

  • Experience with Grafana, Datadog, or Prometheus
  • Familiarity with HPC or GPU-based infrastructure
  • Scripting experience in Bash or Python

Responsibilities

  • Monitor data center systems using telemetry data, dashboards, and alerting tools to detect anomalies and emerging issues
  • Perform independent technical diagnosis across Linux systems, network connectivity, and hardware health using command-line tools, logs, and diagnostic utilities
  • Troubleshoot network-layer issues including connectivity, routing, and interface errors
  • Triage and escalate incidents to the appropriate teams (hardware, network, SRE) with technically accurate summaries, relevant logs, and telemetry findings
  • Create and maintain detailed tickets documenting diagnostic steps, technical findings, and observed system behavior
  • Identify recurring alert patterns through telemetry analysis and surface findings to improve monitoring coverage and reliability

Benefits

  • Comprehensive medical, dental and vision coverage (U.S.)
  • Private medical and dental insurance (U.K.)
  • Retirement and financial wellness support (U.S.)
  • Pension contribution (U.K.)
  • Generous paid time off, plus holidays
  • Paid parental leave
  • Professional development support
  • Wellness and work-from-home stipends
  • Flexible work environment
© 2026 Teal Labs, Inc
Privacy PolicyTerms of Service