In this role, you will contribute to building and operating AI-powered middle-tier services that support conversational experiences within widely used productivity applications. You will focus on prompt evaluation, testing, and automation, ensuring that AI responses are accurate, reliable, and aligned with business and user expectations. You will work closely with engineering, product, and data partners to evaluate LLM behavior, design test strategies, implement supporting code, and continuously improve prompt quality and system performance.