About The Position

Grafana Labs is a remote-first, open-source powerhouse. We build Tempo, the open-source distributed tracing backend behind Grafana Cloud Traces and Grafana Enterprise Traces (GET). Tempo makes it easy to search traces, generate metrics from spans, and connect tracing data with logs, metrics, and profiles across the Grafana stack. 2026 is an inflection point for Tempo. After a major architectural upgrade and the launch of TraceQL metrics, we are shifting from foundational work to product and operational excellence, and evolving Tempo from a SaaS database into a platform that powers Grafana’s next generation of observability products (App Observability, Asserts, Traces Drilldown, and AI-driven assistants). Over the next year, you will help us make Grafana Cloud Traces “just work” for customers by eliminating rough edges, confusing limits, and hidden failure modes; achieve operational excellence at scale as we grow from close to 50 cells today into triple digits this year, with autoscaling, parameterized rollouts, and aggressive toil reduction; evolve Tempo into a platform enabler: higher-density APIs, trace aggregation, TraceQL metrics math, and machine/LLM-friendly interfaces that downstream products and agents can build on; push performance further: faster query latency at hundreds of MB/s ingestion and performant 30-day query ranges to match competitors; and prepare Tempo for an agent-driven world: larger, burstier, higher-cardinality workloads, and new categories of AI-powered workflows, such as assistant-driven triage and “why is this slow?”- style investigations.

Requirements

  • Technical leadership. A track record of leading complex, multi-quarter initiatives that spanned design, delivery, and operations, and made the teams around you better.
  • Deep systems experience. Substantial hands-on experience building and operating distributed data systems in production: ingestion pipelines, storage engines, query execution, or similar.
  • Strong software craftsmanship. You write clean, robust, performant software that others can maintain, and you know when to optimize vs. when to ship.
  • Strong Go, or a path to it. We write Tempo in Go. Deep experience in other systems languages (Rust, C, C++) translates well.
  • Operational mindset. You’ve owned production services, carried a pager, reduced toil, and treated SLOs as a product feature, not a chore.
  • Customer focus and pragmatism. You break complex problems into short feedback loops: analyze, design, deliver an MVP, learn, iterate.
  • Leadership through writing and collaboration. You lead through design docs, reviews, and shipped code, not hierarchy. You communicate clearly in a fully remote, asynchronous environment.

Nice To Haves

  • Experience with tracing, OpenTelemetry, or large-scale observability systems.
  • Experience designing query languages, SQL/TraceQL-like engines, or APIs intended to be consumed programmatically (by services or agents).
  • Experience with columnar storage formats (e.g., Parquet) or purpose-built on-disk formats for analytical workloads.
  • Experience operating multi-tenant, multi-cell SaaS infrastructure at scale on Kubernetes.
  • Experience building for AI/LLM consumers: structured APIs, metadata/discovery endpoints, deterministic outputs, evaluation harnesses.
  • Open-source contribution or maintainership, and comfort engaging a community in the open.
  • Experience as an on-call user of Grafana, Prometheus, Loki, or Tempo in a previous role (or on a homelab).
  • Experience in a fully remote, globally distributed team.

Responsibilities

  • Set technical direction on the hardest problems in our roadmap and raise the bar across the team.
  • Lead multi-quarter technical initiatives from problem framing through rollout, e.g., trace aggregation APIs, Limitless Tempo, autoscaling cells and customer limits, or query engine improvements.
  • Own the architecture of core Tempo components: ingestion, storage, query, and metrics generation. Drive design reviews, make sharp trade-offs on performance, cost, and complexity, and document the “why” for the team.
  • Design APIs for humans and agents. Shape the next generation of Tempo’s interfaces (structured, deterministic, discoverable) so that Act 3 products, LLM-driven assistants, and external integrators can build on Tempo reliably.
  • Drive operational excellence. Own outcomes against concrete SLOs (P99 write latency, incident recurrence, TCO per ingested GB) and push the team toward Zero Ops through automation, parameterized rollouts, and actionable alerts.
  • Partner with Product and sibling teams. Work closely with PMs and with App Observability, Asserts, Drilldown, and Grafana Assistant teams to understand how Tempo gets consumed and to ship what unblocks them.
  • Mentor engineers. Raise the engineering bar through code review, design feedback, pairing on hard problems, and writing that leaves the team smarter than you found it.
  • Participate in on-call for the services you help build, and be a force multiplier in incident response and post-incident learning.
  • Contribute to open source. Tempo is OSS. You will engage the community, review external contributions, and help steer the project in the open.
  • Use modern AI coding assistants as part of your daily workflow (your choice of tools, within security guidelines), backed by a company-funded usage budget so you can iterate quickly without unnecessary friction.
  • Engage in pragmatic AI-assisted development: faster prototyping, test generation, refactors, documentation, and incident follow-ups—always paired with strong code review and quality standards.
  • Work on projects such as trace aggregation and higher-density APIs, autoscaling end to end, agent-scale ingestion and query, query performance, rollouts and multi-cell operations, limits and self-service.

Benefits

  • Restricted Stock Units (RSUs)
  • 100% Remote, Global Culture
  • Scaling Organization
  • Transparent Communication
  • Innovation-Driven
  • Open Source Roots
  • Empowered Teams
  • Career Growth Pathways
  • Approachable Leadership
  • Passionate People
  • In-Person onboarding
  • Global annual leave policy of 30 days per annum
  • 3 days of annual leave entitlement are reserved for Grafana Shutdown Days

Stand Out From the Crowd

Upload your resume and get instant feedback on how well it matches this job.

Upload and Match Resume

What This Job Offers

Job Type

Full-time

Career Level

Senior

Education Level

No Education Listed

Number of Employees

501-1,000 employees

© 2026 Teal Labs, Inc
Privacy PolicyTerms of Service