Agent Evaluation Engineer

Comcast•Washington, DC

20h•Onsite

About The Position

The Agent Evaluation team is responsible for testing whether AI agents return the correct and expected responses. We build the framework, metrics, and test cases that validate agent behavior, accuracy, and reliability before release. Our goal is to ensure agents perform consistently and meet product and user expectations.

Requirements

Bachelor's Degree (or some combination of coursework and experience, or extensive related professional experience)
5-7 Years Relevant Work Experience
Skills: AI Agents
Skills: Benchmarking
Skills: CI/CD
Skills: Curious Mindset
Skills: Evaluation Metrics
Skills: Large Language Models (LLMs)
Skills: Machine Learning (ML)

Nice To Haves

Experience in customer support AI or chatbot platforms
Understanding of responsible AI (bias, fairness, hallucination mitigation)

Responsibilities

Design and develop agent evaluation pipelines across development, staging, and production environments
Define and standardize evaluation metrics and benchmarks for conversational AI quality (accuracy, relevance, CX, safety)
Build automated and human-in-the-loop evaluation systems to assess agent performance
Manage and curate evaluation datasets, test sets, and annotation workflows
Enable continuous evaluation and monitoring of agents in production
Integrate evaluation into CI/CD pipelines to support safe and efficient releases
Conduct experiments, A/B testing, and case studies to drive improvements in agent quality
Partner with engineering, and product teams to deliver high-quality AI solutions
Create technical documentation and drive best practices across teams
Mentor junior engineers and contribute to team growth