Identify and prioritize agent reasoning and model capability opportunities to improve agentic Excel capabilities. Define the quality bar for agent reasoning, making informed tradeoffs across capability, latency, cost, and reliability. Translate real user workflows and mental models into reasoning requirements and evaluation criteria. Define success metrics and evaluation frameworks to measure reasoning quality, correctness, and robustness. Design data collection and labeling tasks to evaluate models and generate training data for fine-tuning and alignment. Prototype and validate new agent behaviors, reasoning patterns, and feature directions. Develop and iterate on prompts and policies to guide consistent, high-quality model behavior across scenarios. Deploy, monitor, and analyze A/B experiments for model and reasoning changes in production. Incorporate user feedback and telemetry to continuously improve model behavior and reliability. Collaborate cross-functionally with research, applied ML, data, infrastructure, and product teams.
Stand Out From the Crowd
Upload your resume and get instant feedback on how well it matches this job.
Job Type
Full-time
Career Level
Mid Level
Number of Employees
5,001-10,000 employees