Staff Backend Engineer - Grafana Enterprise | Canada | Remote

Grafana Labs

57d•CA$186,368 - CA$223,642•Remote

About The Position

Grafana Labs is a remote-first, open-source company with a global user base for its visualization tool, Grafana. The company also supports over 3,000 businesses with its Grafana LGTM Stack, offered as Grafana Cloud (fully managed) or Grafana Enterprise Stack (self-managed). Grafana Labs is experiencing rapid growth while maintaining its core values of open-source legacy, global collaboration, and meaningful work, fostering an innovation-driven environment built on transparency, autonomy, and trust. This is a remote position open to candidates in the US and Canada. The Grafana Enterprise team focuses on developing innovative solutions for large-scale operators with specific security and regulatory needs. This team works on enhancing the Grafana Enterprise platform, which is part of Grafana Cloud, an integrated suite of observability applications. The team collaborates globally, addressing customer challenges related to security, robustness, flexibility, multitenancy, and interoperability. They engage with major cloud service providers and global enterprises to solve complex distributed systems problems for software engineers, site reliability engineers, and platform operators. As a remote-only and global company, Grafana Labs values diverse experiences and backgrounds. Meetings are typically scheduled between 14:00 and 17:00 UTC, with flexibility to accommodate engineers' and customers' needs. The company emphasizes openness, helpfulness, and shared success. The role requires experienced software engineers passionate about distributed systems and building reliable, scalable backend infrastructure. The backend technology stack is Go, and engineers contribute to open-source communities. The company invests in developer productivity, offering AI coding assistants with a funded usage budget and access to frontier models, while maintaining strong code review and quality standards.

Requirements

Deep professional experience writing production services, from ideation through to production operations at scale
Strong distributed systems fundamentals: replication, consistency models, partitioning, fault tolerance, and the trade-offs that come with operating at scale
Demonstrated experience designing and operating systems for large-scale, high-traffic, high-availability, or multi-tenant environments, ideally in the context of infrastructure, observability, or software delivery platforms
Professional experience building and consuming gRPC/protobuf APIs and designing clean service contracts across service boundaries
Strong database skills, such as PostgreSQL and/or MySQL; including schema design, query optimisation, and schema migrations at scale
Experience with large-scale CI/CD systems and build tooling, designing, operating, or integrating with continuous delivery pipelines that serve large engineering organisations or external operators at scale
Comfort working with Kubernetes and containerised deployment environments, including patterns for operating stateful workloads and multi-tenant clusters
Experience with observability tooling: OpenTelemetry, Prometheus metrics, structured logging, and distributed tracing
Familiarity with dependency injection patterns (e.g., Google Wire) and clean, testable service architecture
You work well as a communicative member of a team of engineering professionals.
You earn trust by saying what you mean and doing what you say.
You are customer focused and especially attuned to the needs of large-scale operators who rely on Grafana as critical infrastructure. You start with their needs and work backwards.
You insist on the highest standards and work to develop the skills and knowledge of your fellow team members.
You take on complex distributed systems challenges, break them down into digestible problems, and leverage your team and organization to deliver.
You design modular solutions, deliver minimum loveable products, gather data and feedback, and then progress iteratively.

Nice To Haves

Experience with TypeScript and React for contributing to frontend features and collaborating closely with frontend engineers
Experience with Grafana's LGTM+ observability stack (Loki, Mimir, Tempo, Pyroscope, Alloy)
Prior experience at or building for large-scale cloud service providers, IaaS providers, or global enterprises with demanding SLA requirements
Experience designing or operating large-scale build infrastructure artifact registries, distributed build caches, hermetic build systems (e.g., Bazel), or developer platform tooling

Responsibilities

Earning the trust of our large-scale operator customers to further Grafana's "big tent" philosophy of data accessibility and to meet clear business objectives
Designing and leading the development of backend services, distributed systems, and enterprise features at scale
Driving continuous improvement of our engineering culture through words and actions
Driving projects from initial ideation through the development lifecycle to production
Contributing to the scalability, reliability, security, and multi-tenancy of the Grafana platform trusted by some of the world's largest operators
Owning the operational health of our platform by participating in weekday 12h x 5d and separate weekend 24h x 2d on-call rotations.
Hiring and developing the best engineers to build the future of Grafana
Developing your skills as a thought leader to drive continuous improvement of engineering and operational practices across Grafana Labs
Design and build the backend systems powering Grafana Enterprise — the platform trusted by the world's largest operators to run their observability and software delivery infrastructure.
Hire and develop the best engineers we can to deliver the future of Observability.
Architect and implement distributed backend services in Go, with a focus on correctness, observability, and performance at scale
Design APIs and service contracts used by thousands of enterprise operators and cloud service providers
Collaborate with Product and UX to shape features and partner with frontend engineers to ship complete, end-to-end solutions
Drive scalability and reliability improvements that matter to large-scale operators running Grafana in regulated, high-availability environments
Engage directly with large enterprise customers and cloud service providers to understand their requirements and translate them into robust engineering solutions
Advocate for our customers at every stage of the development lifecycle