AI Quality Engineer

RootlyOffice,

About The Position

Rootly is building the AI-native future of incident management, and we need someone who can push our AI to its limits before our customers do. As our AI Quality Engineer, you'll own the evaluation and optimization of Rootly's agentic AI features -- designing test scenarios, running adversarial prompts, interpreting outputs, and working directly with engineering and product to close the loop on performance. This isn't traditional QA. You'll spend your days thinking like an attacker, a confused user, and a power user all at once -- probing how our AI agents reason, make decisions, and handle edge cases across complex incident workflows.

Requirements

  • +5 years in QA, product operations, AI/ML evaluation, or a closely related role
  • Hands-on experience testing or evaluating LLM-powered or agentic AI products
  • Strong prompt engineering instincts -- you understand how wording, context, and structure affect model behaviour
  • Comfortable writing scripts or working with evaluation tools (Python a plus; not required to be a full-stack engineer)
  • Sharp analytical thinking; you can spot a subtle reasoning failure and articulate exactly why it's a problem
  • Clear written communicator; able to translate AI behaviour findings for both technical and non-technical audiences

Nice To Haves

  • Familiarity with incident management, DevOps, or IT operations workflows is a strong asset
  • Experience with evaluation frameworks (e.g. LangSmith, PromptFlow, Braintrust, or similar)
  • Exposure to red-teaming or adversarial testing of AI systems
  • Comfortable writing E2E tests with Playwright
  • Background working at a B2B SaaS or developer-tools company
  • Familiar with mobile app testing (iOS/Android)

Responsibilities

  • Design and execute prompt-based test scenarios that cover happy paths, edge cases, and adversarial inputs across Rootly's agentic AI features
  • Evaluate AI outputs for accuracy, relevance, consistency, and alignment with expected workflow behaviour
  • Build and maintain an evaluation framework; structured test libraries, scoring rubrics, and regression suites to track AI performance over time
  • Identify failure modes, hallucinations, reasoning gaps, and unexpected agent behaviours; document findings and work with engineers to resolve them
  • Partner with Product and Engineering on new AI feature releases, contributing to acceptance criteria and quality gates before launch
  • Define and track quality metrics (accuracy rates, failure frequency, regression trends) and report findings to stakeholders
  • Stay current on LLM evaluation techniques, prompt engineering best practices, and agentic testing methodologies

Benefits

  • Competitive compensation and early equity in a fast-growing, venture-backed company.
  • Comprehensive medical, dental, and vision coverage.
  • 3 weeks of vacation, plus unlimited sick and mental health days, and a company-wide end-of-year shutdown to recharge.
  • $500 stipend for home office setup.
  • Unlimited token usage and access to AI tools
© 2026 Teal Labs, Inc
Privacy PolicyTerms of Service