AI Evaluation Lead

BlueCross BlueShield of Tennessee•Chattanooga, TN

9h•Remote

About The Position

At BlueCross BlueShield of Tennessee, we’re building a member-facing virtual AI assistant that helps people get answers faster, navigate benefits with confidence, and reduce friction across the healthcare journey. To ensure this experience is safe, trustworthy, and consistently high-quality, we’re hiring an AI Evaluation Lead to own the evaluation strategy and the datasets that prove our assistant is working—before it reaches members and as it evolves over time. In this role, you’ll partner with digital product, technical, operations, compliance, and customer service teams to define what “good” looks like, build gold-standard datasets that reflect real member needs, and drive an enterprise-grade evaluation framework that improves performance, reduces risk, and accelerates responsible delivery. To do that, you’ll need: Hands-on experience with quality methodologies Strong business knowledge and data curation experience Proficiency with Python and analytics tooling; familiarity with conversational AI evaluation patterns If you’re inspired by our mission – peace of mind through better health – and ready for a role where your technical leadership directly influences AI strategy and systems at scale, we’d love to hear from you. Note: This is a fully remote role, but onsite interviews at our Chattanooga, TN headquarters may be required. Sponsorship is not available for this role.

Requirements

Bachelor's degree in STEM (Science, Technology, Engineering or Math) or related field or equivalent work experience required.
5+ years of relevant work experience in technology, software delivery, analytics, quality assurance, or healthcare (academic experience included), or related equivalent experience.
3+ years of experience working within the software development life cycle (SDLC) and/or formal quality management practices (e.g., test planning, release governance, process controls, or continuous improvement).
Demonstrated experience partnering with technical teams to define requirements, acceptance criteria, and measurable outcomes; communicating findings to both technical and non-technical stakeholders.
Proven ability to design and execute evaluation approaches such as functional testing, regression testing, integration testing, and/or production validation and monitoring.
Experience building repeatable reporting and insights (dashboards, KPIs, defect trends) to guide decisions and improve delivery outcomes.
Strong understanding of SDLC practices, quality assurance methodologies, and test strategy development (manual and automated).
Ability to design practical measurement frameworks and quality gates; comfort working with metrics and basic statistical concepts to support decision-making.
Proficiency with common productivity and documentation tools; proficient in Microsoft Office (Outlook, Word, Excel, and PowerPoint).
Working familiarity with defect tracking and delivery tooling (e.g., Jira/Azure DevOps) and with automated testing concepts; scripting/analysis experience (e.g., Python) is a plus.
Demonstrated success leading cross-functional initiatives from conception through deployment; strong organizational skills and ability to manage multiple complex tasks.
Exceptional ability to interpret and translate technical concepts into information meaningful to project team members, business personnel, and leadership.
Must be able to communicate effectively and influence both technical and non-technical co-workers and stakeholders.
Working knowledge of privacy, security, and compliance considerations relevant to healthcare data (e.g., HIPAA) is strongly preferred.

Nice To Haves

Master’s or PhD degree in a relevant field (e.g., Computer Science, Information Systems, Analytics) preferred.
Experience working with cloud-based delivery environments and modern DevOps practices is preferred (GCP, AWS, or Azure).
Experience evaluating or validating AI/ML/Generative AI-enabled systems (e.g., model output review, quality rubrics, monitoring) or partnering with data science teams on quality/readiness preferred.

Responsibilities

Lead the development and operationalization of evaluation and quality programs for technology solutions (including AI-enabled features) tailored for healthcare payer use cases.
Define success criteria, acceptance thresholds, and release readiness “go/no-go” gates across the delivery lifecycle (requirements → build → test → deploy → monitor).
Establish scalable evaluation frameworks that combine functional testing, non-functional testing (performance, reliability, security), and user experience validation.
Translate stakeholder needs into measurable requirements; develop test strategies, test plans, traceability, and evidence packages suitable for audit and governance expectations.
Develop and maintain high-quality test data, scenarios, and validation suites; coordinate and/or lead user acceptance testing and structured feedback cycles.
Partner with engineering and platform teams to embed automated checks into CI/CD pipelines (e.g., regression, integration, smoke tests) and to standardize quality reporting.
Define and track quality metrics (e.g., defect leakage, escaped defects, cycle time, stability, incident rates) and drive continuous improvement through root-cause analysis.
Facilitate defect triage and prioritization; coordinate cross-functional resolution plans and ensure clear communication of quality risks and tradeoffs.
Contribute to risk assessments, compliance-aligned controls, and documentation practices appropriate for healthcare data and regulated environments.
Serve as a trusted advisor to Lines-of-Business, providing pragmatic guidance on quality, validation approaches, and release readiness for complex initiatives.
Stay current on best practices in software quality, testing methodologies, and evaluation tooling; champion adoption of effective standards across teams.
Mentor and guide team members on evaluation design, experiment rigor, documentation, and operational excellence.