Production Engineer, Network

Fluidstackβ€’London, London
β€’$175,000 - $300,000β€’Remote

About The Position

Fluidstack is building civilization-scale infrastructure for AI, aiming to deliver 10 to 100s of GWs of compute faster than anyone else. This involves rethinking every layer of the stack, from acquiring power and designing data centers to operating them. The company emphasizes high ownership, autonomy, velocity, first principles thinking, and a passion for the frontier of AI. The Production Engineering Team is focused on building active debugging tooling, an end-to-end network repair pipeline, and a real-time network monitoring platform for a growing fleet across multiple hyperscale datacenter sites.

Requirements

  • Treat toil as a bug; build tools to automate manual processes like diagnosing link failures.
  • Think in systems and understand how network issues propagate.
  • Move toward ambiguity, build maps, and explain them.
  • Learn at a steep slope and reach competence in unfamiliar domains quickly.
  • Run incidents, write postmortems, and fix systemic causes.
  • Be fluent with AI tooling, including LLM APIs, MCP servers, and agentic frameworks.
  • Have shipped production network tooling or automation that other teams depend on.
  • Be comfortable in any language using AI coding tools.

Nice To Haves

  • Network automation and tooling (gNMI, gRPC, NETCONF, SONiC).
  • Link diagnostics or optical network monitoring.
  • RMA and repair lifecycle automation.
  • Large-scale datacenter fabric (BGP, ECMP, spine-leaf).
  • Out-of-band network management.
  • Experience with Go or Python.

Responsibilities

  • Own network fleet health end to end, including defining real-time monitoring requirements, building the alerting lifecycle, and shipping dashboards for network state across all sites.
  • Build active debugging tooling, such as link diagnostics, remote command execution across the fleet, and repair visualization, to quickly resolve network faults.
  • Develop automation for the network repair pipeline, from fault detection through parts management and return to service, including ticket integration and lifecycle pipelines.
  • Own network qualification and validation by building frameworks that gate new sites and hardware into production, defining healthy network criteria before traffic is carried.
  • Ensure end-to-end reliability, scalability, and operation of the network at-scale through automation, tooling, and incident discipline.

Benefits

  • Competitive total compensation package (salary + equity).
  • Retirement or pension plan, in line with local norms.
  • Health, dental, and vision insurance.
  • Generous PTO policy, in line with local norms.
  • Equity in the form of stock options.
Β© 2026 Teal Labs, Inc
Privacy PolicyTerms of Service