Software Quality Engineer (AI/ML Applications)

Vizient•Chicago, IL

About The Position

In this role, you will validate AI and ML-powered healthcare solutions across the full development lifecycle to ensure data quality, model performance, reliability, and safe deployment in production environments. You will apply Agile and Software Development Lifecycle (SDLC) Quality Assurance (QA) methodologies to validate functional software components and the analytical outputs they generate. You will design and execute data-driven and automated test strategies, including model evaluation, prompt regression testing, dataset profiling, and end-to-end pipeline validation. You will partner with data science, engineering, product, and security teams to define measurable quality gates and deliver compliant, explainable, and dependable AI experiences that drive client value.

Requirements

Relevant degree preferred.
2 or more years of relevant experience required.
Experience in Quality Assurance with a focus on data testing, analytics validation, and BI/ETL systems required.
Experience validating ML or generative AI-based applications, including model evaluation and data quality assessment.
Experience with QA automation frameworks (Selenium or similar) and Python for scripting, data validation, and automation.
Experience evaluating Large Language Model systems, including prompt regression testing and automated or human-in-the-loop judging methodologies.
Familiarity with RAG evaluation concepts, including retrieval quality, context relevance, faithfulness, and safety testing.
Experience designing AI evaluation metrics and monitoring model performance in production environments.
Understanding of Agile methodologies and CI/CD practices.
Strong analytical, documentation, and communication skills; self-starter who thrives in fast-paced, iterative environments and drives quality initiatives amid ambiguity and shifting priorities.

Responsibilities

Develop and execute test strategies for ML and generative AI-powered applications.
Create and execute test cases, plans, and scripts based on story-defined acceptance criteria, and review requirements, functional, technical, and use case documents to determine testability.
Design and maintain automated test suites using QA automation frameworks to validate UI, API, backend services, and AI-enabled platforms.
Integrate automated testing into CI/CD pipelines to enable continuous quality validation and rapid feedback loops.
Design and maintain evaluation frameworks for Large Language Models (LLM), including automated scoring and LLM -as-a-judge methodologies.
Develop prompt regression test suites to detect performance degradation across model and prompt versions.
Evaluate generative AI systems for hallucination risk, factual consistency, grounding accuracy, and safety compliance.
Conduct model evaluation, regression testing, and drift monitoring in development and production environments.
Build dashboards and monitoring tools to detect degraded evaluation scores, drift, or safety risks and support proactive triage.
Design and implement proactive AI-driven alerting and recommendation systems embedded within dashboards and user workflows
Automate dashboard metric generation and refresh pipelines using Python and data workflows.
Partner with cross-functional teams to define AI quality standards, acceptance criteria, and release gates.
Investigate defects, analyze root causes, and recommend corrective actions to improve reliability and performance.