About The Position

Technical Product Managers at Nscale own the definition, delivery, and ongoing evolution of a slice of the Nscale platform, partnering with engineering, design, and go-to-market to turn customer and operational problems into shippable outcomes. As a Senior Technical Product Manager for Observability, you own the platform that gives customers and internal operators real-time visibility into their GPU fleet: the telemetry pipeline that scrapes data from physical infrastructure, the aggregation and storage layer, and the observability surfaces (logs, metrics, and traces) that enable fleet management, incident response, and alerting at scale. You partner daily with Fleet Software, Network Engineering, Data Centre Operations, and customer teams to make fleet health visible, actionable, and reliable as Nscale scales from a handful of deployments to a globally distributed fleet.

Requirements

  • 5–8 years in product management, with a track record owning significant areas in observability, infrastructure, or operations-facing products.
  • Demonstrated experience building observability stacks: you have owned a product that captures and surfaces logs, metrics, and traces at scale, and you understand the architectural and UX tradeoffs involved.
  • Hands-on experience with Prometheus, Loki, Mimir, Datadog, Grafana, or OpenTelemetry.
  • Experience with deployment tooling in a data centre or infrastructure context, including provisioning workflows, networking automation, or zero-touch deployment pipelines.
  • Experience building for operators and delivery teams (design engineers, project controllers, PMs, SREs, DC technicians) and a genuine appetite for their workflows.
  • Strong technical fluency: you can lead architecture and trade-off discussions across telemetry pipelines, time-series storage, alerting systems, and observability integrations.
  • A record of moving ambiguous operational problems to shipped outcomes that measurably improve visibility, incident response, or fleet reliability.
  • Excellent written and verbal communication across engineers, operators, and executives.

Nice To Haves

  • Broader observability problem domain experience across different toolsets beyond the above stack.
  • Familiarity with bare-metal provisioning tools (OpenStack Ironic, MAAS, or similar) or network automation tooling (NetBox, Nautobot, or similar).
  • Degree in CS or engineering, or prior experience as an engineer, SRE, or infrastructure operator.
  • Familiarity with GPU or accelerated compute infrastructure, data centre operations, or hyperscaler-style deployment at scale.
  • ITSM: Jira Service Management, ServiceNow, Zendesk, or Freshservice.
  • Experience in high-growth environments where the product is being built alongside the fleet it monitors.

Responsibilities

  • Own the roadmap for Nscale's observability platform: the telemetry pipeline, log and metrics aggregation, trace collection, and customer facing APIs and dashboards that surface fleet health to customers and operators.
  • Define how logs, metrics, and traces are captured from physical infrastructure, aggregated, and surfaced through the observability platform to enable customers to manage their fleet and handle incidents.
  • Own alerting strategy and optimisation: define what matters, reduce noise, and ensure the right signal reaches the right person at the right time.
  • Capture and prioritise new telemetry requirements as the fleet scales, working with engineering to extend coverage across new hardware, sites, and deployment types.
  • Shadow incident reviews and site operations to turn recurring manual effort and visibility gaps into platform capabilities.
  • Define and drive the metrics that matter: alert signal-to-noise ratio, time-to-detect, time-to-resolve, telemetry coverage, and platform reliability.
  • Mentor junior PMs and raise the bar for PRDs, reviews, and product decisions across the team.

Benefits

  • medical
  • dental
  • vision
  • flexible paid time off
  • parental leave
  • retirement plan participation
© 2026 Teal Labs, Inc
Privacy PolicyTerms of Service