The Dashboards product is Datadog's unified single-pane-of-glass for metrics, logs, and traces—a comprehensive treasure trove of observability data. We are transforming Dashboards into an AI-native control surface and the central hub where every team moves seamlessly from question to insight to action – providing a guided experience that feels like having an expert SRE at your side and ensuring the entry point is never an empty canvas. We're hiring a Staff Applied Scientist to define and guarantee the quality of this AI system at scale. "Good" isn't one number — it spans answer quality, tool-selection accuracy (critical given the growing catalog of data sources and visualizations), retrieval relevance, latency, token cost, and end-to-end agent success. The space is full of open questions. How do you evaluate an agent end-to-end when the trajectory is non-deterministic? How do you score tool selection when a user’s query can result in the agent making decisions against dozens of visualizations and data sources – both of which are growing month over month? How do you build a measurement system that catches regressions across all widget types and data sources (e.g., enforcing correct grouping, sorting, and time overrides), and is easy to use and extend by dozens of teams? If those are the problems you want to spend your time on, come build this with us.
Stand Out From the Crowd
Upload your resume and get instant feedback on how well it matches this job.
Job Type
Full-time
Career Level
Senior