Manager, Agent Evaluation

ComcastWashington, DC
8d$183,064 - $274,595

About The Position

The Agent Evaluation team is responsible for testing whether AI agents return the correct and expected responses. We build the framework, metrics, and test cases that validate agent behavior, accuracy, and reliability before release. Our goal is to ensure agents perform consistently and meet product and user expectations. Job Description Role Summary: The Manager, Agent Evaluation will lead the team responsible for building and scaling the evaluation framework that tests whether AI agents return accurate, reliable, and expected responses across real-world scenarios.

Requirements

  • Strong foundation in machine learning fundamentals and applied ML systems
  • Hands-on experience with model and agent evaluation methodologies
  • Familiarity with LLMs, AI agents, and prompt-driven systems
  • Proficiency in Python and modern ML frameworks (e.g., PyTorch, TensorFlow)
  • Experience defining metrics, benchmarks, and experimentation frameworks
  • Solid understanding of MLOps practices, including model versioning, monitoring, and CI/CD
  • Ability to collaborate effectively with product, platform, and research teams
  • Clear communicator of technical trade-offs, evaluation insights, and results

Responsibilities

  • Lead and grow a team focused on agent and model evaluation
  • Define the strategy, roadmap, and standards for agent testing and validation
  • Oversee development of metrics, benchmarks, and testing frameworks to measure response quality, accuracy, safety, and performance
  • Ensure evaluation coverage aligns with product, UX, and business requirements
  • Partner closely with Product, Engineering, Research, and Platform teams to integrate evaluation into the development lifecycle
  • Drive experimentation and continuous improvement of evaluation methodologies
  • Establish reporting mechanisms to clearly communicate evaluation results and trade-offs to leadership
  • Implement best practices for model versioning, monitoring, and release validation
  • Stay current with advancements in LLMs, AI agents, and evaluation techniques
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service