NOC Engineer / NOC Lead

STN IncSan Francisco, CA
Remote

About The Position

The NOC Engineer operates STN's 24/7 monitoring and first-response capability for GPU One (GPUaaS) infrastructure. The role triages alerts, executes documented runbooks, and coordinates with on-call specialists during incidents to protect customer SLAs.

Requirements

  • 3+ years in a NOC, SOC, or IT operations function
  • Hands-on experience with monitoring tools (Datadog, Prometheus, Grafana, PagerDuty, or equivalent)
  • Strong Linux and basic networking fundamentals
  • Excellent written and verbal communication, particularly under pressure
  • Willingness and ability to work rotating shifts including overnight coverage

Nice To Haves

  • GPU, HPC, or large-scale cloud infrastructure background
  • ITIL Foundations certification
  • Demonstrated on-call and major-incident response experience
  • Scripting skills (Python, Bash) for runbook automation

Responsibilities

  • Monitor infrastructure alerts, customer SLA dashboards, and system health on a 24/7 basis
  • Triage incidents and engage on-call SREs, Network, Hardware, or Field Engineering as needed
  • Execute documented runbooks for common platform, network, and hardware issues
  • Manage the incident lifecycle including initial customer notification and status updates
  • Coordinate planned maintenance windows and change windows with internal teams and customers
  • Update status pages and customer-facing communications during incidents
  • Maintain shift handoff documentation and active-incident logs
  • Support ticket queue handling including Tier 1 ticket resolution
  • Contribute to continuous improvement of monitoring coverage, alert quality, and runbooks
  • Work rotating shifts including nights, weekends, and holidays
© 2026 Teal Labs, Inc
Privacy PolicyTerms of Service