Agent Evaluation Engineer

ComcastWashington, DC
Onsite

About The Position

The Agent Evaluation team is responsible for testing whether AI agents return the correct and expected responses. We build the framework, metrics, and test cases that validate agent behavior, accuracy, and reliability before release. Our goal is to ensure agents perform consistently and meet product and user expectations.

Requirements

  • Bachelor's Degree (or some combination of coursework and experience, or extensive related professional experience)
  • 5-7 Years Relevant Work Experience
  • Skills: AI Agents
  • Skills: Benchmarking
  • Skills: CI/CD
  • Skills: Curious Mindset
  • Skills: Evaluation Metrics
  • Skills: Large Language Models (LLMs)
  • Skills: Machine Learning (ML)

Nice To Haves

  • Experience in customer support AI or chatbot platforms
  • Understanding of responsible AI (bias, fairness, hallucination mitigation)

Responsibilities

  • Design and develop agent evaluation pipelines across development, staging, and production environments
  • Define and standardize evaluation metrics and benchmarks for conversational AI quality (accuracy, relevance, CX, safety)
  • Build automated and human-in-the-loop evaluation systems to assess agent performance
  • Manage and curate evaluation datasets, test sets, and annotation workflows
  • Enable continuous evaluation and monitoring of agents in production
  • Integrate evaluation into CI/CD pipelines to support safe and efficient releases
  • Conduct experiments, A/B testing, and case studies to drive improvements in agent quality
  • Partner with engineering, and product teams to deliver high-quality AI solutions
  • Create technical documentation and drive best practices across teams
  • Mentor junior engineers and contribute to team growth
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service