Senior Data Science Consultant - Enterprise Complaints, Remediations & Loudspeaker

Wells Fargo & Company•Shoreview, MN

3d•$119,000 - $206,000•Onsite

About The Position

Wells Fargo Enterprise Complaints, Remediations and Loudspeaker Analytics (ERA) is seeking a Senior Data Science Consultant focused on advanced analytics and AI solutions supporting voice‑of‑customer insights, risk identification, and operational decisioning. This role is strongly oriented toward applied Generative AI, with a primary focus on designing, experimenting with, and evaluating LLM‑enabled systems that operate on large volumes of unstructured customer interaction data. The consultant will own the end‑to‑end experimentation lifecycle for GenAI use cases — including prompt and agent design, iterative testing, error analysis, tuning, and evaluation — while leveraging traditional machine learning and NLP techniques where appropriate to support or augment GenAI solutions. The role emphasizes practical execution, rapid prototyping, and disciplined evaluation to ensure outputs are reliable, explainable, and suitable for use in risk‑aware, human‑in‑the‑loop decision environments.

Requirements

4+ years of data science experience, or equivalent demonstrated through one or a combination of the following: work experience, training, military experience, education
Master's degree or higher in a quantitative discipline such as mathematics, statistics, engineering, physics, economics, or computer science
Ability to travel up to 10% of the time.
This position is NOT eligible for Visa sponsorship.
Ability to work on site per Wells Fargo's standard operating model in one of the listed locations.

Nice To Haves

Strong hands‑on experience with Python‑based experimentation and analytics workflows, working with large structured and unstructured text datasets; SQL proficiency required, SAS/Teradata a plus.
Demonstrated practical experience building and testing Generative AI solutions, including prompt engineering, prompt tuning, task decomposition, and agent‑style workflows using LLMs.
Proven ability to perform LLM evaluation and error analysis, including hallucination detection, output quality assessment, and false positive/false negative analysis.
Experience designing or implementing confidence, uncertainty, or risk‑scoring mechanisms for GenAI outputs to support review and escalation decisions.
Familiarity with Machine Learning and NLP modeling techniques, and the ability to apply them selectively to complement GenAI‑driven approaches.
Ability to design repeatable testing methodologies, benchmarks, and success metrics for GenAI systems operating in risk‑sensitive environments.
Strong communication skills, with the ability to clearly explain GenAI behaviors, limitations, and experimental findings to both technical and non‑technical audiences.
Experience producing high‑quality documentation covering prompts, experiments, evaluation methods, and system behaviors.
Comfortable operating in ambiguous problem spaces, with an execution mindset focused on experimentation, learning, and continuous improvement.
Strong statistical background and deep understanding of statistical methods for extracting insight from large, complex datasets.
Hypothesis driven, investigative or “detective like” approach to identifying anomalies, edge cases, unexpected behaviors, and weak signals in both data and model outputs.
Comfort applying statistical reasoning to error analysis, uncertainty estimation, and validation of GenAI and ML driven results.

Responsibilities

Lead hands‑on Generative AI experimentation, including prompt engineering, prompt library development, and agent‑style workflows that support voice‑of‑customer understanding, issue identification, and decision support.
Design and execute systematic testing of LLM outputs across large collections of historical customer interaction data, evaluating behavior across tasks, data conditions, and edge cases.
Conduct deep error analysis of GenAI outputs, identifying hallucinations, weak or missing evidence, false positives, false negatives, and ambiguity, and translate findings into targeted prompt and system improvements.
Develop and apply GenAI evaluation frameworks, including rule‑based heuristics, statistical indicators, and LLM‑as‑a‑Judge techniques, to assess output quality, consistency, and risk.
Build and refine confidence and uncertainty scoring mechanisms for LLM decisions to support prioritization and secondary human review in higher‑risk scenarios.
Apply machine learning and NLP models where appropriate to complement GenAI solutions, such as feature extraction, classification, clustering, or signal generation.
Analyze complex structured and unstructured datasets to generate hypotheses, surface emerging risks, and identify opportunities where GenAI can augment or automate decision workflows.
Collaborate closely with product teams, engineers, and business stakeholders to align GenAI experimentation with operational workflows, risk tolerance, and real-world constraints.
Produce clear documentation of prompts, experiments, evaluation methods, and findings to ensure transparency, repeatability, and knowledge sharing.
Communicate GenAI behaviors, trade-offs, limitations, and risks effectively to non-technical stakeholders, helping set appropriate expectations for usage.
May mentor teammates by sharing best practices related to GenAI experimentation, evaluation, and responsible deployment.