Model Policy Manager

OpenAI•San Francisco, CA

4d•Hybrid

About The Position

The Model Policy team aligns model behavior with desired human values and norms. This role involves defining how OpenAI’s models should behave in high-risk or high-ambiguity contexts, such as agentic systems, multimodal systems, user safety, privacy, and other emerging risk domains. The ideal candidate can move across unfamiliar topics, reason from first principles, and turn ambiguity into practical model behavior. This role requires close collaboration with research, engineering, product, preparedness, and operations teams to build policies that are technically grounded, measurable, and responsive to real-world risk.

Requirements

Strong judgment about how advanced AI systems may affect real-world risk, especially in ambiguous, fast-moving, or high-impact areas.
Experience building or applying policies, taxonomies, harm models, threat models, or risk frameworks for complex technical, social, or adversarial systems.
Ability to move across domains without needing to be the deepest subject-matter expert in every area, while knowing when to seek expert input.
Ability to turn fuzzy questions into structured policy frameworks, evaluation criteria, operational guidance, and enforceable model behavior.
Comfortable using empirical evidence, including evaluations, red-teaming results, deployment observations, and model failure modes, to inform policy decisions.
Systems thinking across policy, data, graders, classifiers, training, deployment safeguards, measurement, monitoring, and escalation workflows.
Technical judgment about what model behavior can realistically be trained, measured, evaluated, and enforced at scale.
Ability to work well across research, engineering, product, policy, domain experts, and operational teams.
Clear writing about complex tradeoffs where safety, user value, and implementation constraints all matter.
Pragmatic approach to safety, focused on reducing real-world risk while preserving legitimate, beneficial, and socially valuable uses of AI.
Enjoyment of fast-paced, collaborative research environments where priorities shift as models, evidence, and risks change.
Grounded in implementation details, empirical results, and what can actually be trained or measured.

Responsibilities

Design and maintain model policies across safety-relevant domains, including dual-use, agentic, and emerging frontier-risk areas.
Translate risk and harm models into clear behavioral specifications, evaluation criteria, grading guidance, and system-level safeguards.
Define practical boundaries between beneficial uses of AI and assistance that could materially enable harm, exploitation, misuse, or unsafe outcomes.
Build policy artifacts that support model training, evaluation, and deployment.
Partner with safety researchers, engineers, product teams, and other stakeholders to operationalize policy into scalable model behavior and measurable safeguards.
Use red-teaming results, deployment data, model failures, over-refusals, under-refusals, and ambiguous edge cases to improve policy and evaluation quality over time.
Identify emerging capability areas where frontier AI systems could create new safety challenges or lower barriers to harm.
Study real-world deployments to identify where model behavior succeeds, fails, or drifts from the intended safety posture.
Combine longer-horizon safety research with hands-on launch and deployment work.
Contribute to system cards, safety reports, policy documentation, launch reviews, and external communications on OpenAI's approach to model safety and risk mitigation.
Design and run human data campaigns, including gold set construction, labeling guidance, calibration, adjudication, and eval coverage analysis, to ensure policies can be reliably measured and improved.