Senior Network Operations Engineer - ADC Focus - Federal - 3rd Shift (Nights)

ServiceNow•Santa Clara, CA

8h•Remote

About The Position

Please Note: “This position requires passing a ServiceNow background screening, USFedPASS (US Federal Personnel Authorization Screening Standards). This includes a credit check, criminal/misdemeanor check and taking a drug test. Any employment is contingent upon passing the screening. Due to Federal requirements, only US citizens, US naturalized citizens or US Permanent Residents, holding a green card, will be considered. What you get to do in this role: As a Senior Network Operations Engineer, you will help deliver 24x7 support for our Government Cloud infrastructure. This is a 3rd Shift (early a.m.) position and has a 5-day work week (Monday - Friday). The working hours for the 3rd Shift are from 1:00 am - 10:00 am Pacific Time. Below are some highlights. No on-call rotation Shift Bonuses for 2nd and 3rd shifts Please note 3rd shift will eventually move to 4X10 Day work week (Sunday to Wednesday OR Wednesday to Saturday ) What you get to do in this role: Operate and maintain ServiceNow’s global cloud network infrastructure, including application delivery controller (ADC) systems, backbone routing, top-of-rack (TOR) switching, and VPN services. Troubleshoot and resolve network issues, including urgent operational events. Participate in 24/7 on-call rotation, including weekends, as part of the Network Operations Engineering team. Maintain software-defined, declarative infrastructure at scale using automation tools such as Ansible, GitLab. Perform software upgrades, version control, and security patching across production systems. Proactively analyze network metrics such as capacity, latency, and availability to detect and prevent outages. Support network operations in private and hybrid multi-cloud environments (e.g., Azure, AWS, GCP). Partner with the Site Reliability Engineering (SRE) team to improve operational processes and reliability. Review, consult, and prepare for planned changes and releases to the production environment. Create and maintain detailed documentation of infrastructure, automation, and standard operating procedures. Provide feedback to infrastructure architects and contribute to design discussions for new initiatives. Collaborate with peer teams building world-class networking and orchestration solutions. Evaluate, adopt, and implement new open-source and commercial tools and technologies. Contribute to processes and automation to build a low-touch, continuous deployment pipeline with near-zero downtime and high success rates. Drive automation to enable rapid deployment and updates across large-scale environments.

Requirements

Experience in leveraging or critically thinking about how to integrate AI into work processes, decision-making, or problem-solving. This may include using AI-powered tools, automating workflows, analyzing AI-driven insights, or exploring AI's potential impact on the function or industry
5+ years of network operations experience with a Bachelor's degree; or 3+ years of network operations experience with a Master's degree; or a PhD without experience; or equivalent work experience in network operations, infrastructure engineering, or a similar role supporting large-scale distributed systems.
Strong hands-on experience in a production environment with Application Delivery Controllers (ADC / load balancers e.g., F5, NGINX), routing/switching, and security devices (e.g., Palo Alto, Radware).
Solid understanding of network protocols and services, including TCP/IP, BGP, DNS, TLS/mTLS, and VPNs.
Experience managing hybrid and public cloud environments (AWS, GCP, Azure) in an operational capacity.
Proficient in Linux systems administration and troubleshooting.
Familiarity with container technologies (e.g., Docker, Kubernetes) and service mesh architectures.
Experience with monitoring, observability, and alerting tools (e.g., Prometheus, Grafana, Splunk).
Ability to respond to incident resolution, including root cause analysis and post-mortems.
Proficiency in infrastructure-as-code and automation tools, such as Ansible, Terraform, GitLab CI/CD.
Scripting skills in Python, Bash, or similar languages for automation and tooling.
Experience with change management processes in high-availability production environments.
Excellent problem-solving skills and attention to detail, with a bias toward action and automation.
Effective communication and collaboration skills, including cross-functional team engagement.
Willingness to participate in a 24/7 on-call rotation, including weekends.

Responsibilities

Operate and maintain ServiceNow’s global cloud network infrastructure, including application delivery controller (ADC) systems, backbone routing, top-of-rack (TOR) switching, and VPN services.
Troubleshoot and resolve network issues, including urgent operational events.
Participate in 24/7 on-call rotation, including weekends, as part of the Network Operations Engineering team.
Maintain software-defined, declarative infrastructure at scale using automation tools such as Ansible, GitLab.
Perform software upgrades, version control, and security patching across production systems.
Proactively analyze network metrics such as capacity, latency, and availability to detect and prevent outages.
Support network operations in private and hybrid multi-cloud environments (e.g., Azure, AWS, GCP).
Partner with the Site Reliability Engineering (SRE) team to improve operational processes and reliability.
Review, consult, and prepare for planned changes and releases to the production environment.
Create and maintain detailed documentation of infrastructure, automation, and standard operating procedures.
Provide feedback to infrastructure architects and contribute to design discussions for new initiatives.
Collaborate with peer teams building world-class networking and orchestration solutions.
Evaluate, adopt, and implement new open-source and commercial tools and technologies.
Contribute to processes and automation to build a low-touch, continuous deployment pipeline with near-zero downtime and high success rates.
Drive automation to enable rapid deployment and updates across large-scale environments.