Relevance Metrics Data or Applied Scientist

MicrosoftRedmond, WA
12hHybrid

About The Position

Overview A unique opportunity to join Bing Search, a global search engine powering billions of searches daily, both from humans and from Large-Language Models The Bing Metrics team is looking for passionate data scientists to work on the new generation of metrics and quality control for the Bing Grounding API. The team ensures that Bing returns high-quality, error-free, and authoritative results using a variety of different approaches. Our team builds complex pipeline including crowd judging and machine learning steps to verify our suspicions. Now, we actively use LLMs like ChatGPT as a judge to evaluate the quality of search results at multiple levels: query, answer, whole page and generate insights for the teams who are responsible this experience. As a part of an international and distributed team you will be responsible for RAG quality metrics within Bing Search. The job provides you with the opportunity to work with multiple teams across entire Bing (>80 different teams) and greatly influence the search engine relevance and search result quality of the entire platform. We are an established core team in Bing with very high visibility and impact. We are looking for a talented engineer/DS with a passion to work with LLM and specifically RAG, design, implement and test complex data pipelines built on top of LLM models, create new tools for running multi-step prompts to evaluate search engine quality and generate actionable insights for the teams. Microsoft’s mission is to empower every person and every organization on the planet to achieve more. As employees we come together with a growth mindset, innovate to empower others, and collaborate to realize our shared goals. Each day we build on our values of respect, integrity, and accountability to create a culture of inclusion where everyone can thrive at work and beyond. Starting January 26, 2026, Microsoft AI (MAI) employees who live within a 50- mile commute of a designated Microsoft office in the U.S. or 25-mile commute of a non-U.S., country-specific location are expected to work from the office at least four days per week. This expectation is subject to local law and may vary by jurisdiction.

Requirements

  • Doctorate in Data Science, Mathematics, Statistics, Econometrics, Economics, Operations Research, Computer Science, or related field AND 1+ year(s) data-science experience (e.g., managing structured and unstructured data, applying statistical techniques and reporting results)
  • OR Master's Degree in Data Science, Mathematics, Statistics, Econometrics, Economics, Operations Research, Computer Science, or related field AND 3+ years data-science experience (e.g., managing structured and unstructured data, applying statistical techniques and reporting results)
  • OR Bachelor's Degree in Data Science, Mathematics, Statistics, Econometrics, Economics, Operations Research, Computer Science, or related field AND 5+ years data-science experience (e.g., managing structured and unstructured data, applying statistical techniques and reporting results)
  • OR equivalent experience.

Nice To Haves

  • Doctorate in Data Science, Mathematics, Statistics, Econometrics, Economics, Operations Research, Computer Science, or related field AND 3+ years data-science experience (e.g., managing structured and unstructured data, applying statistical techniques and reporting results)
  • OR Master's Degree in Data Science, Mathematics, Statistics, Econometrics, Economics, Operations Research, Computer Science, or related field AND 5+ years data-science experience (e.g., managing structured and unstructured data, applying statistical techniques and reporting results)
  • OR Bachelor's Degree in Data Science, Mathematics, Statistics, Econometrics, Economics, Operations Research, Computer Science, or related field AND 7+ years data-science experience (e.g., managing structured and unstructured data, applying statistical techniques and reporting results)
  • OR equivalent experience.
  • 3 years of T-SQL.
  • Experience or deep interest in RAG, Large-Language Models.
  • Passion for metrics building for complex multi-step systems.
  • Passion for prompt engineering and text generations with LLMs.
  • Interest in designing dashboards for data visualization in novel ways.
  • Experience with Machine Learning.
  • 5+ years of C#, Python, Java or any other major programming language.
  • 3 years of SQL.
  • Experience with Large Language Models.
  • Ability to work independently, solid collaboration and communication skills.

Responsibilities

  • Design and implement metrics for RAG with Bing Web Search and other APIs.
  • Build pipelines and dashboards for Bing Grounding quality.
  • Use LLM models in LLM-as-a-judge settings for data evaluation.
  • Engineer prompts for textual and multi-model LLMs for data processing and generation of insights.
  • Design and implement E2E pipelines (from sampling anomalies from the logs through prompt engineering to ultimately automatically updatable dashboards).
  • Apply classical ML (feature engineering + model training, text and image embeddings) along with LLM to augment data analysis and processing pipelines.
  • Help teams to build new innovative search experience with Bing.
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service