Enterprise Operations Center Specialist - Senior

C3EL•Washington, DC

5h•Onsite

About The Position

C3EL is seeking a Enterprise Operations Center (EOC) Specialist to support our mission in Washington, D.C., at the Department of Transportation (DOT) Headquarters’ Building. The EOC operates 24x7x365, including all Federal Holidays, and is responsible for maintaining continuous visibility and availability of enterprise infrastructure and services. This EOC Specialist will lead proactive, real-time monitoring using automated monitoring and alerting tools, triage and validate events from internal systems and external providers (e.g., AT&T), perform directed checks of critical systems, and drive corrective actions in accordance with established incident management processes, SOPs, and runbooks. Performs day-to-day activities required to monitor systems for events or alerts. Coordinates and manages the resolutions of events and alerts. Monitors and identifies problem areas and coordinates resolutions. Applies advanced technical concepts, processes, practices, and procedures on complex technical assignments and leads others in these activities.

Requirements

U.S. Citizenship, with ability to attain a Public Trust clearance.
Must have a minimum of 5 years of related experience providing leadership and technical support to an enterprise operation center, monitoring and managing enterprise systems and networks using advanced technologies and tools.
Demonstrated experience serving in an incident command, leading cross-functional technical teams, and producing RCAs and executive-level communications.

Nice To Haves

Familiarity with ServiceNow and other enterprise ITSM/monitoring platforms.
Strong networking knowledge: routing, switching, and VPNs.
ITIL-based incident and change management experience.
Experience managing vendor escalations and maintaining SLAs.
Proven ability to lead critical incident response and drive post-incident remediation.
Certifications: ITIL Foundation, CompTIA Network+/Security+, CCNA/CCNP, SANS/GIAC or equivalent.

Responsibilities

Early analysis and command-level validation — perform initial technical triage, determine event severity, and coordinate with POCs to confirm impact and scope.
Advanced troubleshooting & diagnostics — execute network and system diagnostics (ping, traceroute, packet captures, router/switch log/interface analysis, host/service health checks); interpret telemetry and correlate multi-source logs to identify root causes or escalation requirements.
Escalate & coordinate resolution — own escalation path: contact and liaise with DOT Tier III teams, assign and manage ITTSM tickets in ServiceNow (create, route, and track), and open/manage tickets with outside vendors (e.g., AT&T). Ensure SLA-driven escalation and follow-through.
Incident Command & communications — initiate and anchor the Critical Incident Management process and Incident Response Bridge; act as Incident Commander or Operations Lead as required, coordinate cross-functional responders, take and distribute bridge notes, and update outage communications in real time.
Technical leadership & decision-making — make authoritative operational decisions during incidents, delegate technical tasks, and direct remediation or containment actions while maintaining chain-of-command communications with senior stakeholders.
RCA ownership & knowledge capture — lead or coordinate Root Cause Analysis (RCA) production: gather forensic data, assign sequential RCA IDs, document findings/actions, identify actionable remediation items, and migrate validated content into the knowledge management repository and SOPs.
Hands-on support & physical data center operations — provide on-site technical support for ExecHelp and Tier III teams during off-hours; perform authorized hands-on interventions at the Data Center, escort un-badged personnel as required, and execute hardware/system-level recoveries.
Process & documentation stewardship — create, update, and enforce SOPs, playbooks, escalation matrices, contact lists, and IMC process documentation; maintain remote site POC and topology data.
Reporting & metrics — generate and distribute operational reports (daily/weekly), executive incident summaries, COE Morning summary report, and KPI dashboards tracking MTTR, MTTD, incident frequency, and SLA compliance.
Mentorship & continuous improvement — mentor junior EOC analysts, lead shift handoffs, drive post-incident reviews, and sponsor automation/prioritization efforts to reduce noise and improve mean-time-to-resolution.