Lead Quality Engineer - AI

Wolters KluwerCoppell, TX
14hHybrid

About The Position

We are seeking a Lead AI Quality Engineer to ensure the quality, reliability, and trustworthiness of AI-powered product experiences in Wolters Kluwer Tax and Accounting. This role goes beyond validating that buttons click—you will design tests that confirm the system behaves correctly, measuring retrieval accuracy, citation correctness, and overall alignment of responses with user intent. You will be a key contributor in helping us deliver a system customers can trust.

Requirements

  • Bachelors Degree in Computer Science or equivalent
  • 5+ years of experience in software testing, quality engineering, or equivalent engineering roles with a focus on validation and reliability.
  • Experience with AI evaluation frameworks (e.g. LlamaIndex evals, OpenAI Evals, Ragas, TruLens, or custom harnesses)
  • Strong skills in Python testing frameworks (Pytest, unittest, or equivalent)
  • Experience testing web applications and APIs
  • Familiarity with AI/ML or non-deterministic system testing
  • Knowledge of CI/CD pipelines, Git, and automated regression testing
  • Strong analytical skills: able to define metrics and success criteria where outputs aren’t deterministic
  • Comfortable working in a fast-paced Agile environment with weekly sprints, pairing, and close collaboration with PM/UX/Dev

Nice To Haves

  • Knowledge of retrieval-augmented generation (RAG) pipelines
  • Experience with metrics/observability tooling (Grafana, Prometheus, Datadog)
  • Familiarity with containerized environments (Docker, Kubernetes)
  • Exposure to performance/load testing tools (Locust, k6, JMeter)

Responsibilities

  • Design and implement evaluation harnesses to measure retrieval accuracy, citation correctness, response quality, and overall system behavior
  • Develop automated tests for APIs, ingestion pipelines, and chat workflows
  • Collaborate with developers and product managers to define quality metrics (accuracy, latency, cost, hallucination rate)
  • Analyze logs, traces, and feedback signals to identify root causes of failures in AI-driven responses
  • Create regression suites to ensure changes to prompts, chunking, or embeddings don’t break existing behavior
  • Validate REST APIs and service integrations for resilience, correctness, and security
  • Contribute to observability by instrumenting metrics and dashboards for system performance
  • Participate in sprint planning and retrospectives, ensuring testability is built into features from day one

Benefits

  • Medical, Dental, & Vision Plans
  • 401(k)
  • FSA/HSA
  • Commuter Benefits
  • Tuition Assistance Plan
  • Vacation and Sick Time
  • Paid Parental Leave
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service