About The Position

Grafana Labs is a remote-first, open-source powerhouse. The instantly recognizable dashboards have been spotted everywhere from a NASA launch and Minecraft HQ to Wimbledon and the Tour de France. Grafana Labs also helps more than 3,000 companies -- including Bloomberg, JPMorgan Chase, and eBay -- manage their observability strategies with the Grafana LGTM Stack, which can be run fully managed with Grafana Cloud or self-managed with the Grafana Enterprise Stack, both featuring scalable metrics (Grafana Mimir), logs (Grafana Loki), and traces (Grafana Tempo). We’re scaling fast and staying true to what makes us different: an open-source legacy, a global collaborative culture, and a passion for meaningful work. Our team thrives in an innovation-driven environment where transparency, autonomy, and trust fuel everything we do. This is a remote opportunity and we would be interested in applicants located in USA time zones (EST + CST only at this time). Staff Backend Engineer - Application Core Services, Stacks The Opportunity: Application Core Services (AppCore) partners closely with our Cloud, Enterprise, and Grafana teams to deliver reliable internal and customer-facing systems that power critical parts of the Grafana business. We build on the grafana.com platform to create custom solutions and integrations across the many systems that support a modern software company. The team owns important domain areas that help keep both our customer workflows and internal business processes running smoothly. AppCore is made up of multiple squads, each focused on one or more of these domains. Our work includes maintaining the billing engine responsible for customer usage calculation, automating provisioning after a customer signs a contract, integrating with cloud marketplaces such as AWS, Azure, and GCP, and building and maintaining the user portal our customers rely on to manage their accounts. This is a team working at the intersection of product, platform, and business operations. The systems we build are critical to how Grafana scales. We are looking for engineers who enjoy solving complex workflow and systems problems, improving reliability and developer experience, and building software that directly supports both customers and internal stakeholders. As a company we are remote-first and global, we embrace people of different experiences and backgrounds to build diverse teams where every person brings a unique perspective to the software. Engineers at Grafana also have the opportunity to contribute to Open Source communities and collaborate across teams beyond their immediate scope.

Requirements

  • At least 1 year of fully remote work experience
  • Worked on a big SaaS platform and dealt with common distributed systems problems (e.g. scalability, multi-tenancy, data isolation, HA, …)
  • Professional experience with Golang and be willing to work across both backend service and application code
  • Care deeply about developer and user experience and the quality of the products that you work on
  • Some experience with delivering projects from gathering requirements, and brainstorming ideas to shipping a product to the customer’s hands in a self-driven way
  • Write clean, robust, well-tested software that other engineers can understand, operate, and maintain
  • Experience with mentoring junior engineers in a collaborative but asynchronous environment
  • Can take on complex challenges and break them down to achieve tight learning loops: to analyze, design, and build modular solutions, deliver MVPs, gather data and feedback, and then progress iteratively
  • Willing to work across teams. Your work has to be aligned with the needs of other squads and external stakeholders. You make your plans transparent, bring stakeholders on board, and are open to feedback and suggestions
  • Strong Kubernetes experience in AWS, GCP, or Azure, and familiarity with infrastructure-as-code tooling (Helm, Terraform, Jsonnet, etc.)
  • Experience participating in blameless incident response and writing high-quality post-incident reviews

Nice To Haves

  • Experience with TypeScript/Node.js
  • Experience with Kubernetes control-plane patterns, operators, reconcilers, or desired-state systems
  • Experience with Jsonnet/Tanka, Terraform, Flux, Argo, or similar deployment/configuration tooling
  • Experience working on SaaS provisioning, tenancy, regional expansion, plugin rollout, or customer lifecycle systems
  • Experience with incident response involving configuration drift, partial failure, or cross-service state mismatch

Responsibilities

  • Design, build, and operate reconciliation systems, including the SSS backend, to track desired stack state, detect and repair drift across stack templates, grafana.com state, Hosted Grafana, and actual customer stack configuration
  • Collaborate across SSS, grafana.com, and deployment configurations to ensure stack lifecycle workflows remain reliable, observable, and resilient
  • Improve operational efficiency by reducing deployment complexity (e.g., aiming for single PR regional SSS deployment) and contributing to the Stack Config Reconciliation project
  • Manage rollout mechanisms for provisioned plugins, dashboards, data sources, Grafana versions, release channels, and stack-level configuration
  • Support new region and cluster rollouts, including the operational paths required to bring stacks online safely in new Grafana Cloud regions
  • Improve incident response and recovery paths for stack misalignment, reconciliation failures, plugin rollout issues, and Hosted Grafana integration failures
  • Partner with Product, Hosted Grafana, Infrastructure, Support, and adjacent AppCore squads on customer-impacting stack lifecycle work
  • Contribute to roadmap planning, technical design, OnCall improvements, and long-term simplification of stack operations
  • Own the production behavior of the systems you build, including improving runbooks, dashboards, alerts, reconciliation safety, rollout controls, and recovery procedures.
  • Be comfortable debugging across service boundaries and making careful changes in systems that affect customer stacks
  • Participate in our follow-the-sun OnCall rotation
  • Participate in team decisions, such as roadmap planning and prioritization
  • Writing efficient, readable, and easy to maintain code
  • Designing new microservices or systems
  • Collaborating with teammates and other departments to reach consensus on proposed solutions
  • Coordinating with product and UX when needed
  • Responding to customer requests and feedback

Benefits

  • Equity
  • Bonus (if applicable)
  • Restricted Stock Units (RSUs)
  • Global annual leave policy of 30 days per annum
  • 3 days of annual leave entitlement are reserved for Grafana Shutdown Days

Stand Out From the Crowd

Upload your resume and get instant feedback on how well it matches this job.

Upload and Match Resume

What This Job Offers

Job Type

Full-time

Career Level

Senior

Education Level

No Education Listed

Number of Employees

501-1,000 employees

© 2026 Teal Labs, Inc
Privacy PolicyTerms of Service