About The Position

We are seeking an experienced Grafana Cloud Implementation Specialist with 5+ years of handson expertise in designing, deploying, and optimizing observability solutions using Grafana Cloud. The ideal candidate will have strong experience in metrics, logs, traces, dashboard development, alerting, and integrating Grafana with modern cloud-native systems.

Requirements

  • 5+ years of experience implementing and managing Grafana or Grafana Cloud.
  • Strong hands-on expertise with:
  • Grafana Mimir, Loki, Tempo, Prometheus, Alertmanager
  • PromQL, LogQL, and SQL query optimization
  • OpenTelemetry instrumentation
  • Experience with cloud platforms: AWS, Azure, or GCP.
  • Experience with Kubernetes, microservices, and cloud-native observability.
  • Proficiency in DevOps tools: Terraform, Helm, Git, CI/CD pipelines.
  • Strong understanding of SRE principles, monitoring design, and performance engineering.
  • Ability to collaborate with cross-functional teams and drive end-to-end observability adoption.

Nice To Haves

  • Grafana Cloud or Prometheus certifications.
  • Experience designing multi-tenant monitoring solutions.
  • Knowledge of ServiceNow, PagerDuty, Opsgenie, or Jira integrations.
  • Experience with security, RBAC, and compliance frameworks.
  • Experience with Splunk, New Relic, or similar observability tools
  • Ability to support hands-on implementation in a fast‐moving environment and work independently

Responsibilities

  • Grafana Cloud Implementation & Administration
  • Design, deploy, and manage Grafana Cloud environments for enterprise customers.
  • Configure data sources including Prometheus, Loki, Tempo, InfluxDB, Elasticsearch, and cloud provider monitoring tools (AWS CloudWatch, Azure Monitor, GCP Operations Suite).
  • Implement secure authentication/authorization via SSO, OAuth, LDAP, Azure AD, or enterprise identity solutions.
  • Optimize Grafana architecture for performance, scalability, and cost efficiency.
  • Observability & Monitoring Architecture
  • Build end-to-end observability stacks covering metrics, logs, and traces.
  • Develop robust monitoring strategies aligned with SRE/DevOps practices (SLIs/SLOs/SLAs).
  • Set up alerting (Grafana Alerting, Alertmanager, Loki alerts) with escalation policies.
  • Dashboarding & Visualization
  • Create custom, user-friendly, and visually compelling Grafana dashboards.
  • Work closely with application, infrastructure, and business teams to translate monitoring requirements into actionable visualizations.
  • Standardize dashboard templates across teams.
  • Integration & Automation
  • Integrate Grafana Cloud with CI/CD pipelines, Kubernetes clusters, and cloud platforms.
  • Automate observability deployments using Terraform, Helm, Ansible, or GitOps workflows.
  • Implement instrumentation using OpenTelemetry for distributed tracing.
  • Troubleshooting & Optimization.
  • Diagnose issues across metrics, logs, traces, and data source configurations.
  • Tune queries for performance and cost optimization.
  • Ensure high availability and reliability of monitoring systems.
  • Documentation & Training
  • Create detailed documentation for system architecture, runbooks, and best practices.
  • Train internal teams on how to use Grafana dashboards, alerts, and observability workflows.
  • Serve as an SME for Grafana Cloud in cross-functional initiatives.
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service