Production Engineer, IaaS

FluidstackAustin, TX
$175,000 - $300,000

About The Position

Fluidstack is building civilization-scale infrastructure for AI, aiming to deliver 10 to 100s of GWs of compute faster than anyone else. This involves rethinking every layer of the stack, from acquiring power to designing, building, and operating data centers with teams spanning hardware and software. The company hires individuals who are deeply committed to this mission and operate with high ownership, full autonomy, velocity, and a first-principles approach. The Production Engineering Team is central to this effort, working on critical problems such as building an observability platform for a large fleet of devices, designing the API surface for production infrastructure, and integrating fleet state as a source of truth across various platforms. The role of a Production Engineer, IaaS is to own key aspects of this infrastructure, ensuring its reliability, scalability, and efficiency.

Requirements

  • Treat toil as a bug; build solutions to eliminate manual intervention.
  • Design APIs that age well and avoid leaky abstractions at scale.
  • Embrace ambiguity, build maps, and communicate them clearly.
  • Learn new domains and achieve competence rapidly.
  • Handle pager duty, run incidents, write postmortems, and fix systemic causes.
  • Be fluent with AI tooling such as LLM APIs, MCP servers, and agentic frameworks, and use AI coding tools daily.
  • Shipped production services that other teams depend on at scale.
  • Comfortable in any language using AI coding tools.

Nice To Haves

  • Distributed systems and data pipeline engineering.
  • Experience with time-series observability stacks (Prometheus, Thanos, VictoriaMetrics).
  • API design and versioning at scale.
  • Experience with workflow and orchestration engines (Temporal, Cadence).
  • Familiarity with BMC/Redfish or hardware telemetry.
  • Proficiency in Go, Python, and Postgres.

Responsibilities

  • Own the observability platform, including data pipelines, decoration and correlation engines, and healthcheck frameworks to make the fleet legible from site down to device and link.
  • Define and build the API surface for infrastructure, designing contracts between production infrastructure and all tools that interact with it.
  • Build the production control plane, encompassing unified machine management, actual state inspection, and distributed command execution, supported by Kubernetes-based infrastructure.
  • Own fleet state as a source of truth, ensuring alignment with SLOs, site lifecycle state, and integration with internal and customer-facing platforms.
  • Manage the clean integration of new hardware into the platform, including ZTP, DHCP, DNS, and artifacts for new XPU generations and site integrations.

Benefits

  • Competitive total compensation package (salary + equity).
  • Retirement or pension plan, in line with local norms.
  • Health, dental, and vision insurance.
  • Generous PTO policy, in line with local norms.
  • Equity in the form of stock options.
© 2026 Teal Labs, Inc
Privacy PolicyTerms of Service