AI Enablement & Governance– AI Quality & Evaluation Lead

Alight

9h•Remote

About The Position

The AI Quality & Evaluation lead enables responsible and scalable AI adoption by defining technical quality standards, evaluation framework and control requirements across the AI lifecycle. As a function owner of AI quality and robustness standards, the role translates enterprise trust principles and statistical rigor into practical evaluation standards and enforceable technical controls. The role partners closely with AI Engineering, Data Scientists, QA CoE, and Product teams to define quality and performance requirements that are embedded by design, ensuring AI solutions—including RAG systems and Agents—are accurate, grounded, and aligned with enterprise risk and trust expectations.

Requirements

Technical Depth: 5–8+ years of experience in Data Science, ML Engineering, or AI Quality, with a focus on evaluation and statistical validation.
System Design: Practical experience partnering with engineers to design RAG, LLM-based Agents, or traditional ML pipelines.
Analytical Skills: Expert-level Python (Pandas, Scikit-learn) and experience with evaluation frameworks (e.g., RAGAS, TruLens, or MLflow).
Governance Mindset: Demonstrated ability to translate abstract trust concepts into mathematical metrics and enforceable technical controls.
Stakeholder Influence: Demonstrated ability to work without direct authority, driving quality adoption across engineering and product teams through enablement.
Communication: Ability to bridge the gap between high-level governance policy and low-level code implementation.
Bachelor’s degree in a technical field (e.g., Computer Science, Computer Systems Design) or equivalent professional experience

Responsibilities

Partnering directly with AI Engineers, Application Developers and Data Scientists during the design phase to define technical quality acceptance criteria and fit-for-use requirements.
Embedding quality considerations into model and system architecture from the onset, specifically for complex patterns like RAG and autonomous Agents.
Defining golden truth requirements and evaluation dataset standards; partner with Data Science teams to ensure datasets reflect production-level complexity.
Defining quality and evaluation expectations for third-party AI systems and vendor-supplied models, ensuring consistent governance standards regardless of model origin.
Designing and maintaining structured evaluation framework that assesses AI system against defined quality bars (e.g. Goodness-of-Fit, Calibration, Stability).
Developing automated metrics for Generative AI performance including Groundedness (Hallucination detection), Faithfulness, Completeness, and other domain-relevant metrics.
Defining and operationalize fairness and bias evaluation criteria, including demographic parity assessments and disparate impact testing for client-facing AI systems.
Calibrating evaluation thresholds and monitoring cadence to AI risk tier, ensuring proportionate controls without over-engineering lower-risk use cases.
Identifying and document technical AI governance controls that enable automated compliance with performance and risk obligations.
Establishing drift and ongoing monitoring requirements, defining statistical triggers for feature and concept drift that necessitate model intervention.
Developing clear control statements that articulate the expected evidence artifacts (e.g. test results, model cards) required go/no-go decisions.
Providing objective, data-driven evaluation outputs that support AI governance reviews and risk classification.
Translating governance expectations into clear, testable quality criteria that engineering teams can apply consistently within their CI/CD pipelines.
Maintaining authoritative documentation of AI controls to support audit, regulatory review, and internal assurance activities.