AI Red Teamer (LLM Generalist)

HandshakeSeattle, WA
$32 - $95

About The Position

As an AI Red Teamer, you will stress-test large language models by intentionally trying to break them. Rather than checking whether an answer is correct, you will design creative, adversarial prompts that expose vulnerabilities: unsafe content, bias, broken guardrails, hallucinations, prompt injection weaknesses, and unexpected behaviors. Your work directly supports AI safety and model robustness for leading research labs. This is a generalist red teaming role. You will probe models across the full spectrum of risk categories, including content safety, CBRN (chemical, biological, radiological, nuclear), cybersecurity, persuasion and influence operations, child safety, self-harm, over-companionship, and regulatory compliance. Red teaming may span text, image, voice, and agentic model capabilities depending on project needs. This role requires creativity, curiosity, and an ability to think like an adversary while operating with strong ethical judgment.

Requirements

  • Strong hands-on experience using multiple LLMs (ChatGPT, Claude, Gemini, open-source models, etc.)
  • Intuition for crafting adversarial prompts; familiarity with jailbreak or evasion techniques is a strong plus
  • Creative, adversarial problem-solving skills
  • Clear and thoughtful written communication
  • Strong ethical judgment and the ability to separate adversarial thinking from personal values
  • Self-directed, collaborative, and comfortable in feedback-heavy environments
  • Curiosity, persistence, and comfort with frequent failure in experimentation

Nice To Haves

  • Familiarity with Python or other scripting languages
  • Experience working with LLM APIs or evaluation tooling
  • Comfort with structured data annotation and rubric-based scoring
  • Prior work in trust and safety, content moderation, QA, or security research
  • Subject matter expertise in any high-risk domain (cybersecurity, chemistry, biology, medicine, law, finance, etc.)

Responsibilities

  • Craft creative prompts and multi-turn scenarios to stress-test AI guardrails across diverse risk categories
  • Discover ways around safety filters, restrictions, and defenses using jailbreak, evasion, and prompt injection techniques
  • Explore edge cases to provoke disallowed, harmful, or incorrect outputs
  • Evaluate and score model responses against structured harm taxonomies and severity rubrics
  • Document experiments clearly, including what you tried, why you tried it, and what it revealed
  • Review and refine adversarial prompts generated by other team members
  • Contribute to harm taxonomy development, calibration exercises, and inter-rater reliability work
  • Collaborate with engineers, data scientists, and researchers to share findings and strengthen defenses
  • Work with potentially disturbing content on a regular basis (see Content Warning below)
  • Stay current on jailbreaks, attack methods, and evolving model behaviors
© 2026 Teal Labs, Inc
Privacy PolicyTerms of Service