Enterprise Operations Center Specialist - Senior

SAICWashington, DC
Onsite

About The Position

SAIC is seeking an Enterprise Operations Specialist to support our government role. This position is in Washington, DC at the Department of Transportation (DOT) Headquarters’ Building. The EOC operates 24 hours per day, 7 days per week including all Federal Holidays and will utilize appropriate monitoring tools and follow standard incident management processes. Event & Availability Monitoring: Lead and supervise proactive, real-time monitoring of enterprise infrastructure and services using automated monitoring/alerting platforms. Triage and validate events from automated tools and external providers (e.g., AT&T), perform directed checks of critical systems, and drive corrective actions per SOPs and runbooks.

Requirements

  • Early analysis and command-level validation
  • Advanced troubleshooting & diagnostics
  • Escalate & coordinate resolution
  • Incident Command & communications
  • Technical leadership & decision-making
  • RCA ownership & knowledge capture
  • Hands-on support & physical data center operations
  • Process & documentation stewardship
  • Reporting & metrics
  • Mentorship & continuous improvement
  • Experience with monitoring tools and incident management processes
  • Experience with automated monitoring/alerting platforms
  • Experience with AT&T or similar external providers
  • Experience with ServiceNow
  • Experience with ITTSM tickets
  • Experience with Root Cause Analysis (RCA)
  • Experience with knowledge management repositories and SOPs
  • Experience with data center operations
  • Experience with SOPs, playbooks, escalation matrices, contact lists, and IMC process documentation
  • Experience generating operational reports and KPI dashboards

Responsibilities

  • Performs day-to-day activities required to monitor systems for events or alerts.
  • Coordinates and manages the resolutions of events and alerts.
  • Monitors and identifies problem areas and coordinates resolutions.
  • Applies advanced technical concepts, processes, practices, and procedures on complex technical assignments and leads others in these activities.
  • Lead and supervise proactive, real-time monitoring of enterprise infrastructure and services using automated monitoring/alerting platforms.
  • Triage and validate events from automated tools and external providers (e.g., AT&T), perform directed checks of critical systems, and drive corrective actions per SOPs and runbooks.
  • Perform initial technical triage, determine event severity, and coordinate with POCs to confirm impact and scope.
  • Execute network and system diagnostics (ping, traceroute, packet captures, router/switch log/interface analysis, host/service health checks); interpret telemetry and correlate multi-source logs to identify root causes or escalation requirements.
  • Own escalation path: contact and liaise with DOT Tier III teams, assign and manage ITTSM tickets in ServiceNow (create, route, and track), and open/manage tickets with outside vendors (e.g., AT&T). Ensure SLA-driven escalation and follow-through.
  • Initiate and anchor the Critical Incident Management process and Incident Response Bridge; act as Incident Commander or Operations Lead as required, coordinate cross-functional responders, take and distribute bridge notes, and update outage communications in real time.
  • Make authoritative operational decisions during incidents, delegate technical tasks, and direct remediation or containment actions while maintaining chain-of-command communications with senior stakeholders.
  • Lead or coordinate Root Cause Analysis (RCA) production: gather forensic data, assign sequential RCA IDs, document findings/actions, identify actionable remediation items, and migrate validated content into the knowledge management repository and SOPs.
  • Provide on-site technical support for ExecHelp and Tier III teams during off-hours; perform authorized hands-on interventions at the Data Center, escort un-badged personnel as required, and execute hardware/system-level recoveries.
  • Create, update, and enforce SOPs, playbooks, escalation matrices, contact lists, and IMC process documentation; maintain remote site POC and topology data.
  • Generate and distribute operational reports (daily/weekly), executive incident summaries, COE Morning summary report, and KPI dashboards tracking MTTR, MTTD, incident frequency, and SLA compliance.
  • Mentor junior EOC analysts, lead shift handoffs, drive post-incident reviews, and sponsor automation/prioritization efforts to reduce noise and improve mean-time-to-resolution.
© 2026 Teal Labs, Inc
Privacy PolicyTerms of Service