About The Position

Within Nscale, the Network Operations team is responsible for the performance and reliability of the high-speed networks that underpin our AI platforms. These front-end networks are critical to inference workloads, cluster management, data movement, and storage connectivity. We’re looking for a Senior Front End Network Engineer – AI Infrastructure to join our Network Operations team. In this role, you will be responsible for the day-to-day health, stability, and performance of Nscale’s large-scale Ethernet front-end networks. You’ll bring deep operational expertise from hyperscale or high-performance environments and play a key role in incident response, performance tuning, automation, and continuous improvement of production AI networking systems.

Requirements

  • 5+ years of experience in network engineering, with at least 3 years operating large-scale Ethernet data centre or cloud networks
  • Deep, hands-on operational experience with high-speed Ethernet fabrics in hyperscale or production environments
  • Strong expertise with Arista (EOS) and/or Nokia (7220 IXR / 7250 IXR / 7750 SR series) platforms
  • Solid understanding of modern data centre networking, including BGP, OSPF, ECMP, EVPN-VXLAN, and leaf-spine architectures
  • Proven experience with long-haul circuits and DCI (dark fiber, carrier Ethernet, coherent optics)
  • Experience with storage networking over Ethernet and shared storage connectivity
  • Proven ability to troubleshoot complex network issues using Linux-based tooling and fabric diagnostics
  • Proficiency in Python, Go, or shell scripting for automation, data analysis, or configuration management
  • Experience working in a 24/7 operational environment with a strong focus on reliability and toil reduction

Nice To Haves

  • Extensive hands-on experience with Arista or Nokia platforms at scale
  • Deep familiarity with front-end network patterns for large AI clusters (inference traffic, management networks, and storage integration)
  • Experience operating large-scale DCI / long-haul optical or carrier networks
  • Strong background in network observability and telemetry systems (streaming telemetry, sFlow, Prometheus, Grafana, etc.)
  • Prior experience in automation-first network operations or building internal tooling

Responsibilities

  • Owning the operational health, configuration consistency, and performance tuning of large-scale Ethernet front-end fabrics (leaf-spine / Clos) supporting AI inference, management, and storage workloads
  • Leading the diagnosis and resolution of complex network incidents (P0/P1), spanning optics, routing, switching hardware, long-haul circuits, and storage connectivity layers
  • Driving blameless postmortems and implementing preventative fixes to improve long-term fabric stability and availability
  • Partnering with SREs to define requirements for automation and tooling, and contributing to network provisioning, validation, and monitoring systems
  • Collaborating with Network Architecture and Engineering teams to validate designs and enforce standards for routing, congestion management, firmware baselines, and change safety
  • Monitoring fabric utilisation and performance, identifying bottlenecks, and tuning for predictable latency and throughput on front-end networks
  • Acting as a subject matter expert for cross-functional teams on high-speed Ethernet networking, long-haul/DCI circuits, and storage network integration
  • Participating in an on-call rotation supporting mission-critical, customer-facing infrastructure

Benefits

  • Highly competitive package (base + equity) with reviews every 12 months
  • Join the fastest-growing tech startup
  • Dynamic progression plan tailored to your ambitions
© 2026 Teal Labs, Inc
Privacy PolicyTerms of Service