Senior Network Operations Manager

Black Mountain Dynamics•Mountain View, CA

About The Position

The Network Operations Manager owns the reliability, performance, and continuous operation of a global enterprise network supporting a leading autonomous-mobility client’s IT operations organization. This is a management role: you will lead the people, processes, and standards that keep critical network infrastructure running, bridging the gap between new-site deployment and long-term steady-state reliability. Moving the team beyond reactive fire-fighting, you will own operational acceptance of newly deployed infrastructure, direct the response to major incidents, and mature the incident, change, and problem-management disciplines that govern a high-availability environment. You will set the operational strategy, develop the engineers who execute it, and serve as the senior escalation point and primary operational interface to client stakeholders. Success in this role is measured by network availability, mean time to repair, the maturity of your team and its runbooks, and the confidence of the client’s IT leadership in day-to-day operations.

Requirements

8+ years in network engineering, enterprise deployment, or high-velocity network operations, including 3+ years in a formal people-management or team-lead capacity.
Proven track record leading NOC or network operations teams in a 24/7, high-availability environment.
Demonstrated ownership of network operations within critical infrastructure carrying high-availability requirements (99.99%+ uptime).
Strong hands-on background configuring and troubleshooting multi-vendor network devices via CLI and cloud-managed controllers (e.g., Cisco, Juniper, Arista, Palo Alto, Fortinet) — enough to lead engineers credibly and make sound architectural calls.
Practical, ownership-level command of ITIL Incident, Change, and Problem management.
Excellent verbal and written communication; able to translate technical detail into business-impact narratives for cross-functional and client stakeholders.

Nice To Haves

Familiarity with network automation tooling (e.g., Python, Ansible, Terraform, NetBox, Jinja2) and how to apply it to deploy and audit infrastructure at scale.
Working knowledge of BGP peering, OSPF, EVPN-VXLAN, stateful firewall policy, and complex traffic engineering.
Experience operating in data center, fleet, mission-critical, or autonomous / high-technology environments.
Experience delivering operations as an embedded contractor or through an MSP relationship.
B.S. in Computer Engineering, Electrical Engineering, Computer Science, or equivalent practical experience. Certifications such as CCNP/CCIE, PCNSE, JNCIP, ITIL, or PMP are a strong plus.

Responsibilities

Lead the operations team: Manage, mentor, and develop the Tier 1/2 NOC and network engineering staff supporting the account; own hiring, onboarding, performance, and career development.
Own staffing models, on-call rotations, and shift scheduling to guarantee continuous coverage of a round-the-clock operation without single points of failure.
Serve as the senior escalation owner and operational decision-maker; hold the team accountable for SLA attainment, quality, and adherence to standards.
Forecast workload and headcount needs, and make the case for resourcing to both Black Mountain Dynamics and client leadership.
Own the availability and performance targets for the enterprise network (99.99%+ uptime), and be accountable for the metrics behind them.
Drive down Mean Time to Repair through better tooling, telemetry, escalation paths, and post-incident action tracking.
Define what “good” looks like for monitoring, alerting, and telemetry, and ensure the team can see and act on network health proactively.
Produce clear operational reporting (availability, incident trends, SLA performance, risk) for client and internal leadership.
Own the Network Acceptance Testing (NAT) framework, ensuring newly deployed infrastructure meets security, scalability, and observability standards before production sign-off.
Direct the hypercare phase following new site launches and major upgrades; ensure anomalies are stabilized and infrastructure is cleanly handed over to steady-state operations.
Oversee the authoring and review of high-risk Methods of Procedure (MOPs) for installing, staging, and upgrading firewalls, core switches, wireless access points, and UPS systems.
Ensure high-availability network operations across critical facilities (e.g., automated data centers, localized data-ingress hubs, and fleet maintenance facilities), accounting for power, cooling, and structured-cabling constraints.
Ensure high-performance pipelines optimized for massive data ingress/egress (such as local vehicle/fleet data offloading) run without network bottlenecks.
Own the response to P1/P0 disruptions: coordinate the technical bridge, drive rapid service restoration, and communicate business impact to stakeholders in real time.
Own the change-management process for the account; chair or represent operations in change review, ensuring risk assessments minimize production downtime.
Run the problem-management program: ensure Post-Incident Reviews (PIRs) are completed, chronic architectural weaknesses are identified, and permanent remediation is tracked to closure.
Champion the shift from legacy, manual configuration toward automated, template-driven architectures to improve consistency and MTTR.
Own the library of runbooks, configuration baselines, and troubleshooting playbooks that uplift the capability of Tier 1/2 NOC agents.
Act as the primary operational point of contact for the client’s IT operations leadership; translate complex network issues into clear, business-impact summaries.
Manage relationships with hardware vendors, carriers, and support partners, holding them to their SLAs and escalating effectively.