Manager, Agent Evaluation

Comcast•Washington, DC

8d•$183,064 - $274,595

About The Position

The Agent Evaluation team is responsible for testing whether AI agents return the correct and expected responses. We build the framework, metrics, and test cases that validate agent behavior, accuracy, and reliability before release. Our goal is to ensure agents perform consistently and meet product and user expectations. Job Description Role Summary: The Manager, Agent Evaluation will lead the team responsible for building and scaling the evaluation framework that tests whether AI agents return accurate, reliable, and expected responses across real-world scenarios.

Requirements

Strong foundation in machine learning fundamentals and applied ML systems
Hands-on experience with model and agent evaluation methodologies
Familiarity with LLMs, AI agents, and prompt-driven systems
Proficiency in Python and modern ML frameworks (e.g., PyTorch, TensorFlow)
Experience defining metrics, benchmarks, and experimentation frameworks
Solid understanding of MLOps practices, including model versioning, monitoring, and CI/CD
Ability to collaborate effectively with product, platform, and research teams
Clear communicator of technical trade-offs, evaluation insights, and results

Responsibilities

Lead and grow a team focused on agent and model evaluation
Define the strategy, roadmap, and standards for agent testing and validation
Oversee development of metrics, benchmarks, and testing frameworks to measure response quality, accuracy, safety, and performance
Ensure evaluation coverage aligns with product, UX, and business requirements
Partner closely with Product, Engineering, Research, and Platform teams to integrate evaluation into the development lifecycle
Drive experimentation and continuous improvement of evaluation methodologies
Establish reporting mechanisms to clearly communicate evaluation results and trade-offs to leadership
Implement best practices for model versioning, monitoring, and release validation
Stay current with advancements in LLMs, AI agents, and evaluation techniques

Stand Out From the Crowd

Upload your resume and get instant feedback on how well it matches this job.

Upload and Match Resume