AI Testing Engineer

Crowe LLP•Dallas, TX

33d•Remote

About The Position

About Crowe AI Transformation Everything we do is about making the future of human work more purposeful. We do this by leveraging state-of-the-art technologies, modern architecture, and industry experts to create AI-powered solutions that transform the way our clients do business. The new AI Transformation team will build on Crowe’s established AI foundation, furthering the capabilities of our Applied AI / Machine Learning team. By combining Generative AI, Machine Learning and Software Engineering, this team empowers Crowe clients to transform their business models through AI, irrespective of their current AI adoption stage. As a member of AI Transformation, you will help distinguish Crowe in the market and drive the firm’s technology and innovation strategy. The future is powered by AI, come build it with us. About the Team We invest in expertise. You’ll have the time, space, and support to go deep in your projects and build lasting technical and strategic mastery. You’ll work with developers, product stakeholders, and project managers as a trusted leader and domain expert. We believe in continuous growth. Our team is committed to professional development and knowledge-sharing. We protect balance. Our distributed team culture is grounded in trust and flexibility. We offer unlimited PTO, a flexible remote work policy, and a supportive environment that prioritizes sustainable, long-term performance. About the Role The AI Testing Engineer I (Senior Staff) plays a critical role in ensuring the quality, reliability, safety, and compliance of enterprise AI and machine learning systems. This role leads advanced testing and validation efforts, architects automated evaluation frameworks, and assesses model behavior across functional and non-functional dimensions, including accuracy, robustness, bias, drift, and safety. Working closely with AI engineering, data science, security, and product teams, the engineer defines testing strategies, builds evaluation datasets, and identifies risks across predictive and generative AI systems. As a senior staff-level contributor, the role establishes platform-wide testing standards, integrates AI testing into CI/CD workflows, mentors other engineers, and supports responsible AI adoption. This position significantly advances the maturity of AI validation practices and ensures dependable, trustworthy deployment of AI capabilities across the organization.

Requirements

4+ years of experience in software testing, ML engineering, data science, or related roles.
Strong proficiency in Python and automated testing frameworks.
Deep understanding of model evaluation techniques, including precision/recall, calibration, robustness, and stability testing.
Familiarity with LLM evaluation metrics, safety testing approaches, and structured test design.
Demonstrated ability to diagnose complex model, data, and pipeline failures.
Strong collaboration and communication skills across technical and non-technical teams.
Willingness to travel occasionally for cross-functional planning and collaboration.

Nice To Haves

Bachelor’s degree in Computer Science, Engineering, Data Science, or a related technical field, or equivalent experience.
Experience testing AI/ML systems in cloud-based environments.
Hands-on experience with cloud ML platforms such as SageMaker, Vertex AI, or Azure ML.
Familiarity with containerization (Docker), Kubernetes, and distributed test execution.
Experience integrating automated AI testing into CI/CD pipelines (e.g., GitHub Actions or similar tools).
Experience with monitoring and logging systems for post-deployment model validation.
Advanced experience testing generative AI systems, including LLMs for accuracy, bias, safety, and hallucinations.
Familiarity with RAG evaluation workflows and vector databases (e.g., FAISS, Pinecone, Weaviate).
Experience with prompt engineering, adversarial prompting, and synthetic data generation.
Familiarity with Hugging Face evaluation tools and testing fine-tuned models (e.g., LoRA, QLoRA).
Testing, quality engineering, or cloud certifications.
Excellent analytical, documentation, and mentorship skills.
Ability to collaborate effectively in hybrid or remote team environments and support extended hours during critical model releases or incidents.

Responsibilities

Designing comprehensive testing strategies for predictive models, generative AI systems, and end-to-end ML pipelines.
Leading the development of automated test harnesses, evaluation suites, and validation tools integrated into CI/CD workflows.
Analyzing model outputs for correctness, safety, fairness, robustness, and stability across diverse test scenarios.
Building synthetic datasets, challenge sets, and adversarial test cases to uncover model weaknesses.
Evaluating LLM and generative model behavior, including hallucination rates, prompt sensitivity, and retrieval accuracy.
Collaborating with engineering and data science teams to define evaluation criteria, KPIs, and acceptance thresholds.
Troubleshooting complex ML system issues such as performance degradation, drift, or unexpected failure patterns.
Implementing post-deployment monitoring systems to continuously validate model behavior in production.
Documenting testing methodologies, findings, and recommendations to inform system improvements.
Guiding junior engineers and QA specialists in advanced AI testing techniques and tools.
Ensuring adherence to enterprise responsible AI, safety, security, and compliance standards.
Identifying reliability and trust risks and contributing to mitigation strategies.
Contributing to AI platform architectural decisions to improve testability and observability.
Researching and evaluating emerging AI testing methodologies, benchmarks, and tooling ecosystems.