Staff Applied Scientist - Dashboards

Datadog•New York, NY

1d•$276,000 - $345,000•Hybrid

About The Position

The Dashboards product is Datadog's unified single-pane-of-glass for metrics, logs, and traces—a comprehensive treasure trove of observability data. We are transforming Dashboards into an AI-native control surface and the central hub where every team moves seamlessly from question to insight to action – providing a guided experience that feels like having an expert SRE at your side and ensuring the entry point is never an empty canvas. We're hiring a Staff Applied Scientist to define and guarantee the quality of this AI system at scale. "Good" isn't one number — it spans answer quality, tool-selection accuracy (critical given the growing catalog of data sources and visualizations), retrieval relevance, latency, token cost, and end-to-end agent success. The space is full of open questions. How do you evaluate an agent end-to-end when the trajectory is non-deterministic? How do you score tool selection when a user’s query can result in the agent making decisions against dozens of visualizations and data sources – both of which are growing month over month? How do you build a measurement system that catches regressions across all widget types and data sources (e.g., enforcing correct grouping, sorting, and time overrides), and is easy to use and extend by dozens of teams? If those are the problems you want to spend your time on, come build this with us.

Requirements

You have a BS/MS/PhD in a scientific field, or equivalent experience.
10+ years of relevant engineering or applied science experience, including time as a technical lead.
Proven track record of leading ML or GenAI initiatives in a product-driven environment, from research through production.
Significant experience with evaluation, experimentation, or measurement of ML systems at scale.
You bring a strong product mindset and are comfortable driving initiatives across cross-functional teams.
You thrive in ambiguity and can make sound technical calls when the path isn’t yet defined.

Responsibilities

Own the evaluation strategy for Dashboards, as well as sister teams within our organization.
Define the metrics — offline and online, quality and cost, single-turn and multi-turn — that the team and the broader organization optimize against.
Build the eval datasets, golden traces, and regression harnesses that catch quality changes before they hit customers, and make those assets reusable by every team that is building dashboards and widgets through agents
Drive measurable improvements to retrieval relevance, tool-selection accuracy, and context efficiency, partnering closely with the engineers on the team.
Provide technical leadership across the Dashboards team and the broader organization through design reviews, working groups, and mentorship.