AI Red-Teamer - Adversarial AI Testing English

Weekday AI

30d•$50 - $111•Remote

About The Position

We are seeking AI Red-Teamers to help test and strengthen modern AI systems through adversarial evaluation. In this role, you will challenge AI models with carefully designed inputs to uncover weaknesses, surface vulnerabilities, and generate high-quality data that improves the safety, reliability, and robustness of conversational AI. This work focuses on proactively identifying potential risks before they appear in real-world use. By systematically probing AI systems, you will help ensure they respond safely, accurately, and responsibly across a wide range of scenarios. This role may include reviewing AI outputs that reference sensitive topics such as bias, misinformation, or harmful behaviors. All work is text-based, and participation in higher-sensitivity projects is optional and supported with clear guidelines and wellness resources.

Requirements

You have prior red-teaming experience, such as adversarial AI testing, cybersecurity, or socio-technical risk analysis
You naturally think adversarially, exploring ways to push systems to their limits and uncover weaknesses
You prefer structured methodologies, using frameworks and benchmarks rather than ad-hoc testing
You communicate risks and vulnerabilities clearly to both technical and non-technical audiences
You are comfortable working across multiple projects and adapting to new evaluation challenges

Nice To Haves

Adversarial Machine Learning: jailbreak datasets, prompt injection attacks, RLHF/DPO vulnerabilities, or model extraction techniques
Cybersecurity: penetration testing, exploit development, reverse engineering
Socio-technical risk analysis: harassment or misinformation testing, abuse pattern analysis
Creative adversarial thinking: backgrounds in psychology, acting, writing, or other disciplines that support unconventional attack strategies

Responsibilities

Red-team AI models and agents by testing jailbreak attempts, prompt injections, misuse scenarios, and exploit strategies
Generate high-quality human evaluation data by annotating model failures, classifying vulnerabilities, and identifying systemic risks
Apply structured testing methodologies using taxonomies, benchmarks, and playbooks to ensure consistent evaluation
Document findings clearly and reproducibly, producing reports, datasets, and adversarial test cases that teams can act upon
Work across multiple projects, supporting different AI systems and evaluation objectives

Stand Out From the Crowd

Upload your resume and get instant feedback on how well it matches this job.

Upload and Match Resume