Researcher, Automated Red Teaming

OpenAISan Francisco, CA
3d

About The Position

This role leads the Automated Red Teaming (ART) effort: building scalable, research-driven systems that continuously discover failure modes in our models and mitigations — and translate those findings into actionable, production-facing improvements. The goal is to maximize counterfactual reduction in expected harm by finding the highest-leverage, least-covered weaknesses early and reliably.

Requirements

  • Feel a strong pull toward AI safety, and you’re motivated by reducing real-world catastrophic risk (not just publishing cool results).
  • Love breaking systems (responsibly) — you get energy from finding weird, high-severity failure modes and turning them into concrete fixes.
  • Have strong applied research instincts, especially around evaluations: you’re good at designing experiments that are reproducible, interpretable, and hard to fool.
  • Bring hands-on experience with LLMs and agents, including multi-turn behaviors, tool use, and the ways models adapt to constraints.
  • Are comfortable building scalable automation, not just prototypes — you can turn red-teaming ideas into pipelines that run continuously and produce high-signal outputs.
  • Have solid software engineering fundamentals (data structures, algorithms, testing discipline) and you can work effectively in a production-adjacent environment.
  • Think in threat models and incentives, and you naturally ask “what would an attacker do next?” or “how would this fail under pressure?
  • Can translate messy findings into action, communicating clearly with researchers, engineers, product, and policy — and driving alignment on what to fix first.
  • Care about efficiency and prioritization, and you’re happy to say “no” to low-leverage work to focus on what moves the risk needle.

Nice To Haves

  • Experience in adversarial ML, security research / red teaming, abuse prevention systems, or large-scale eval infrastructure.

Responsibilities

  • You will own the research and technical direction for automated red teaming across catastrophic risk areas, with an initial emphasis on: Automated classifier jailbreak discovery (cyber and bio)
  • Automated bio threat-development elicitation (worst-feasible planning uplift)
  • CoT monitoring evasion probing (and adjacent loss-of-control evaluations)
  • You will partner tightly with: Vertical risk teams (Cyber, Bio, Loss of Control) to define threat models, prioritize targets, and land mitigations
  • The Classifiers team to turn discovered attacks into training data, evals, and measurable robustness gains
  • Product / eng / safety stakeholders to ensure ART outputs are operationally useful (not just interesting)
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service