Data Scientist - User Risk Measurement

OpenAI•San Francisco, CA

58d•Hybrid

About The Position

About the Team The Intelligence & Investigations (I2) team detects and disrupts abuse and strategic risk so people can use our products safely. Within I2, Strategic Intelligence & Analysis is building a first-of-its-kind user-risk measurement function: policy-grounded baselines, confidence intervals, and attribution that reveal how real users engage with frontier AI—and how that changes when we ship mitigations. We sit between Safety Systems, Data Science, Integrity, and Product, turning heterogeneous safety signals into decision intelligence that appears in executive dashboards, launch/post-launch readouts, and weekly briefs. About the Role As a Data Scientist on this new function, you will help define how the industry measures complex, adaptive human–AI behavior at scale: You will establish trustworthy baselines for user-level risk, create attribution that links changes to real-world mitigations and events, and deliver concise narratives and metrics that guide safety strategy and product decisions. The work spans end-to-end ownership—from framing the questions to delivering decision-ready outputs with clear quality standards and governance—while collaborating closely across Safety Systems, Data Science, Product, and Policy This role is based in San Francisco, CA (hybrid, 3 days/week): Relocation support is available

Requirements

Have 3–6+ years in data science, measurement/causal inference, or risk analytics in high-stakes domains
Are strong in sampling, inference, uncertainty quantification, probability theory, and rare-event estimation; comfortable with time-varying metrics
Write solid Python and SQL; are fluent with data warehouses and productionizing notebooks/pipelines
Communicate crisply, translating complex estimators into clear actions for executives and cross-functional partners

Nice To Haves

experience with Airflow DAGs or other ETL pipelines, Databricks, survival analysis, streaming/online detection, classifier evaluation/QA, privacy reviews/audit trails, or integrity/fraud/safety experience

Responsibilities

Define the measurement framework for user-level risk across products and cohorts: scope the questions that matter and align on clear, policy-grounded definitions
Establish baselines and statistical confidence for core metrics: prevalence, intensity, trends, and cohort dynamics
Build decision-ready reporting surfaces: executive dashboards, weekly briefs, and launch readouts that translate insights into action
Clean and organize ambiguous data from disparate sources, with an eye toward building automated pipelines and systems
Create attribution and change-tracking: connect shifts in user behavior to mitigations, product changes, and external events
Partner across Safety Systems, Data Science, Integrity, Product, and Policy: ensure one coherent analytics entry point and consistent standards
Uphold quality, privacy, and governance: document methods, ensure auditability, and maintain durable measurement hygiene
Monitor signals for emerging risks and anomalies: recommend priorities that reduce harmful usage and improve user safety
Communicate clearly and concisely: deliver insights and trade-offs to executives and engineering teams in language that drives decisions

Stand Out From the Crowd

Upload your resume and get instant feedback on how well it matches this job.

Upload and Match Resume