Position Summary... What you'll do... Key Responsibilities AI Evaluation & Transparency Manage dashboards connecting capability metrics with business metrics to surface what’s working and what isn’t. Maintain golden datasets for benchmarking model relevance, execution accuracy, personalization, and compliance. Partner with AI Tech and Tech Ops to calibrate LLM-as-Judge and ensure automated evaluation remains aligned with human judgment. Shopper Experience Quality & Audits Conduct periodic audits of shopper-facing AI experiences (answers, nudges, recommendations, in-shop chat). Define and monitor KPIs for intent recognition, execution accuracy, and personalization precision. Run cross-journey evaluations to identify friction or divergence patterns. SOP Governance & Continuous Improvement Own and evolve SOPs for annotation, labeling, and evaluation to reflect changing features and customer contexts. Capture and codify new edge cases into updated labeling guides. Ensure all SOPs align with shopper-first outcomes and compliance . Human + AI Divergence Monitoring Track variance between human labels, automated judgments, and live model outputs. Create structured escalation paths for high-risk failures (irrelevant, unsafe, or biased responses). Quantify divergence trends and drive remediation with Product and Engineering. Vendor & Partner Management Manage external labeling partners, ensuring throughput, accuracy, and compliance SLAs. Deliver training grounded in real Sparky interactions and golden dataset examples. Audit vendor output and hold partners accountable for quality metrics. Closed-Loop Feedback & Model Integration Build and manage the continuous feedback loop that connects: Shopper signals Human review & annotation Automated LLM evaluation Ensure that every signal from real interactions feeds directly into model training, experience design, and KPI improvement —driving measurable gains in personalization, nudge effectiveness, and conversational trust
Stand Out From the Crowd
Upload your resume and get instant feedback on how well it matches this job.
Job Type
Full-time
Career Level
Mid Level
Number of Employees
5,001-10,000 employees