Copilot usage is growing across Microsoft 365 and custom agent experiences. To keep pace with diverse customer needs, regulatory requirements, and rapid innovation in the model ecosystem, we’re expanding our model choice across multiple providers and modalities. A robust, data‑driven evaluation and observability platform ensures we select the right model for each scenario—balancing quality, safety, latency, and cost—and de‑risks vendor lock-in while increasing resilience and agility. Within Microsoft, our teams already compare models on capabilities, cost, and latency and visualize responsible AI metrics (e.g., groundedness, coherence, relevance, similarity) in integrated dashboards—this role accelerates and productizes those patterns for Copilot Studio makers and platform teams. You will build the backend systems, APIs, and evaluation pipelines that let Copilot and Copilot Studio safely and efficiently route requests across multiple model providers. You’ll partner with platform PMs, applied scientists, and reliability engineers to instrument end‑to-end quality signals, govern rollouts, and create decisioning frameworks that map model/provider selection to Copilot core use cases (authoring, reasoning, retrieval‑augmented generation, multi‑agent orchestration, and domain-specific tasks). Microsoft’s mission is to empower every person and every organization on the planet to achieve more. As employees we come together with a growth mindset, innovate to empower others, and collaborate to realize our shared goals. Each day we build on our values of respect, integrity, and accountability to create a culture of inclusion where everyone can thrive at work and beyond.
Stand Out From the Crowd
Upload your resume and get instant feedback on how well it matches this job.
Job Type
Full-time
Career Level
Mid Level
Number of Employees
5,001-10,000 employees