AI Red-Teamer - Adversarial AI Testing English

Weekday AI
$50 - $111Remote

About The Position

We are seeking AI Red-Teamers to help test and strengthen modern AI systems through adversarial evaluation. In this role, you will challenge AI models with carefully designed inputs to uncover weaknesses, surface vulnerabilities, and generate high-quality data that improves the safety, reliability, and robustness of conversational AI. This work focuses on proactively identifying potential risks before they appear in real-world use. By systematically probing AI systems, you will help ensure they respond safely, accurately, and responsibly across a wide range of scenarios. This role may include reviewing AI outputs that reference sensitive topics such as bias, misinformation, or harmful behaviors. All work is text-based, and participation in higher-sensitivity projects is optional and supported with clear guidelines and wellness resources.

Requirements

  • You have prior red-teaming experience, such as adversarial AI testing, cybersecurity, or socio-technical risk analysis
  • You naturally think adversarially, exploring ways to push systems to their limits and uncover weaknesses
  • You prefer structured methodologies, using frameworks and benchmarks rather than ad-hoc testing
  • You communicate risks and vulnerabilities clearly to both technical and non-technical audiences
  • You are comfortable working across multiple projects and adapting to new evaluation challenges

Nice To Haves

  • Adversarial Machine Learning: jailbreak datasets, prompt injection attacks, RLHF/DPO vulnerabilities, or model extraction techniques
  • Cybersecurity: penetration testing, exploit development, reverse engineering
  • Socio-technical risk analysis: harassment or misinformation testing, abuse pattern analysis
  • Creative adversarial thinking: backgrounds in psychology, acting, writing, or other disciplines that support unconventional attack strategies

Responsibilities

  • Red-team AI models and agents by testing jailbreak attempts, prompt injections, misuse scenarios, and exploit strategies
  • Generate high-quality human evaluation data by annotating model failures, classifying vulnerabilities, and identifying systemic risks
  • Apply structured testing methodologies using taxonomies, benchmarks, and playbooks to ensure consistent evaluation
  • Document findings clearly and reproducibly, producing reports, datasets, and adversarial test cases that teams can act upon
  • Work across multiple projects, supporting different AI systems and evaluation objectives
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service