Senior AI Engineer - Grafana Ops, AI/ML | Canada | Remote

Grafana Labs
CA$129,392 - CA$217,128Remote

About The Position

Grafana Labs is seeking a Senior AI Engineer to join their AI/ML team. This is a remote position open to applicants in Canadian time zones only. The AI teams at Grafana play a crucial role in helping users understand complex observability data through AI-driven features, aiming to reduce toil, lower the barrier of domain expertise, and surface meaningful signals. The team operates with a high degree of autonomy and ownership, empowering engineers to make decisions and move quickly. The ideal candidate will have a strong software engineering background, a quick iteration mindset, and a passion for experimentation, with a focus on shipping and scaling impactful features. This role involves developing, testing, and shipping AI-powered features to improve infrastructure and observability quality through automation, and expanding AI agent capabilities for incident response. There is significant opportunity to expand or redefine the role based on impact and initiative.

Requirements

  • Experience with LLMs, prompt engineering, and building applications powered by GenAI.
  • Proven track record of delivering software that made it into production and is actively used by users.
  • Exposure to working in cloud-native environments (e.g., AWS, GCP, Azure).
  • Experience using observability tools to understand and troubleshoot system behavior.
  • Strong engineering skills: Solid experience building production software systems (backend and / or full stack).
  • You’re a self-starter, capable of tackling complex engineering problems with minimal supervision.
  • AI experience with a practical mindset: Familiarity with AI technologies and frameworks, focusing on delivering high-quality solutions that work in the real world.
  • Quick iteration and experimentation: Comfortable releasing prototypes, collecting feedback, and iterating with a pragmatic mindset.
  • Proven initiative: Take ownership and drive projects forward, pushing boundaries to find the most impactful solutions. Ability to deal with ambiguity and define scope where things are loosely defined.
  • Collaborative attitude: Communicate effectively with peers, product managers, and designers. Open to feedback, and bring a solutions-oriented mindset.

Nice To Haves

  • Experience building or working with agent frameworks or multi‑agent workflows.
  • Experience with infrastructure / devops related tooling: Kubernetes, Docker, Terraform or similar for deployments.
  • Familiarity with model fine-tuning techniques.
  • Experience building observability tooling.

Responsibilities

  • Build and deliver AI solutions: Take ownership of developing high-performance AI features to help users detect, triage, and resolve incidents using observability data and tools.
  • Rapid experimentation and iteration: Implement a highly iterative process where you quickly prototype, test, and validate with real users, including shipping and evolving LLM- or agent-powered workflows for incident lifecycle management and automated analysis tasks.
  • Collaborate cross-functionally: Work with data analysts, product managers, and designers to shape AI-driven product features, including integration of agentic components with internal tools, alerting systems, runbooks, and developer workflows.
  • Utilize AI tools effectively: Use AI and automation tools to enhance both product functionality and your own development workflows.
  • Effective communication: Communicate effectively and contribute across teams in a highly dynamic and collaborative environment.
  • Ownership and impact: Take full ownership of the AI solutions you develop, ensuring they are innovative, scalable, maintainable, and aligned with real user workflows.

Benefits

  • 100% Remote, Global Culture
  • Scaling Organization
  • Transparent Communication
  • Innovation-Driven
  • Open Source Roots
  • Empowered Teams
  • Career Growth Pathways
  • Approachable Leadership
  • Passionate People
  • In-Person onboarding
  • Global annual leave policy of 30 days per annum
  • 3 days of annual leave entitlement are reserved for Grafana Shutdown Days
  • Restricted Stock Units (RSUs)
© 2026 Teal Labs, Inc
Privacy PolicyTerms of Service