Red Team Manager (Training, Quality & Roleplay Excellence)

mpathic

About The Position

We’re building a world-class Red Team to evaluate and improve the safety, reliability, and policy compliance of advanced AI systems. As Red Team Manager (Training, Quality & Roleplay Excellence), you will train and lead Red Team reviewers, develop experts who can produce high-quality adversarial and vulnerable-user roleplays, and set the quality bar for how we evaluate model behavior across harm categories. This is a hands-on role combining training leadership, content quality, and structured evaluation. You’ll define how the team learns, how we calibrate judgment, and how we ship consistent, actionable red teaming results at scale.

Requirements

4+ years in trust & safety, AI evaluation, red teaming, security testing, content integrity, or similar applied roles.
Strong experience building training programs, rubrics, or QA frameworks for human judgment work.
Ability to evaluate roleplays and adversarial scenarios with consistency and high signal-to-noise.
Excellent written communication—clear, structured, and test-case oriented.
Experience leading or mentoring teams in fast-moving environments.

Nice To Haves

Experience red teaming LLMs, agentic systems, or tool-using models (prompt injection, data exfiltration, policy probing).
Familiarity with evaluation methods: gold sets inter-rater reliability (or strong proxy measurement instincts) sampling strategies and quality gates
Background in one or more harm domains (self-harm, medical, violence, fraud, extremism, harassment).
Experience scaling an operational team and improving productivity without quality loss.

Responsibilities

Train & Lead Red Team Reviewers
Onboard new Red Team reviewers and run recurring calibration sessions to align on quality standards.
Set expectations and maintain consistency across reviewers for evaluation depth, writing quality, and reproducibility.
Build workflows for review (sampling, escalation, dispute resolution, feedback loops).
Train Experts on Roleplays, Model Behavior & Harm
Train red team experts on how to roleplay realistic user scenarios—including vulnerable users—without sensationalism.
Teach systematic adversarial techniques (prompt escalation, persistence strategies, boundary probing).
Help experts understand model failure modes: policy boundary drift, refusal weaknesses, hallucinations, unsafe compliance, and tone failures.
Create Training Materials & Resources
Build and maintain: Red team playbooks and rubrics Example libraries (“gold standard” roleplays + evaluations) Defect taxonomy (what counts as a meaningful finding vs noise) Brief modules for domain harm areas (self-harm, minors, extremism, medical, fraud, harassment, etc.)
Write clear guidance that enables new hires to become productive quickly.
Review & Evaluate Vulnerable User Roleplays
Review vulnerable-user roleplays produced by experts for realism, safety relevance, and correct targeting of failure modes.
Ensure roleplays are: behaviorally plausible ethically framed actionable for model improvement consistent with internal policies and customer expectations
Create Vulnerable User Roleplays
Personally produce high-quality vulnerable-user roleplays, including: ambiguous edge cases multi-turn scenarios culturally nuanced or emotionally realistic interactions scenarios that stress safety, tone, and reliability
Review Hiring Applicants
Own parts of the hiring loop for red team experts and reviewers: design work samples evaluate candidate submissions provide structured feedback and hiring recommendations
Help build a scalable standard for what “great” looks like in this role.

Benefits

Build a red team from the ground up with influence over process, hiring, and standards.
Work on high-impact AI safety problems with direct customer and product feedback loops.
Help define what “best-in-class” red teaming looks like in the industry.

Stand Out From the Crowd

Upload your resume and get instant feedback on how well it matches this job.

Upload and Match Resume