Jailbreaking Lead (Red Team)

FAR.AI•,

9d•$170,000 - $250,000•Remote

About The Position

FAR.AI is a non-profit AI research institute dedicated to ensuring advanced AI is safe and beneficial for everyone. Our mission is to facilitate breakthrough AI safety research, advance global understanding of AI risks and solutions, and foster a coordinated global response. FAR.AI’s red team is building toward a simple outcome: materially raising the bar for safety and security of the most widely deployed and capable AI systems in the world. We intend to be the tip of the spear in AI safety: the team that consistently finds the failures others miss, resulting in real mitigations, and setting the standard that labs and governments converge on. We also leverage our in-depth understanding of weaknesses in frontier models to advise frontier developers on mitigations, to guide our own research and grantmaking for improving model security, and to inform the public of key AI risks. As the Jailbreaking Lead, you will be the senior technical owner of the jailbreaking practice, reporting to Kellin Pelrine with a dotted line to Edward Yee. In 2026, FAR.AI is scaling its impact by red-teaming all major frontier model releases, expanding strategic engagements with governments, deepening testing in key risk areas, and building tools to raise the global standard for red-teaming. This role is primarily a senior individual contributor (IC) role, focused on personally breaking the hardest targets, setting the team's technical bar, and discovering high-severity, universal vulnerabilities in frontier models. Approximately 50-70% of your time will be hands-on model breaking, inventing new techniques, and defining vulnerability severity. The remaining time will be dedicated to managing/mentoring ICs, shaping the research agenda, and ensuring findings translate into real-world impact with labs and governments. A management track is available for candidates interested in leading a team, but the technical IC bar will not be lowered.

Requirements

Obsession with frontier model jailbreaks, similar to elite security researchers obsessing over zero-days.
Track record (public, private, or both) of finding non-obvious, high-severity vulnerabilities in frontier AI systems, including universal or near-universal jailbreaks in heavily defended risk domains.
Deep technical craft combined with the judgment to identify which vulnerabilities matter and how defenses are constructed across different frontier models.
Communication skills to make frontier labs and governments act on findings.
Excitement for high-stakes, real-world technical work where success is measured by mitigations adopted and standards shifted.
Desire to work with leading AI companies, governments, and academics.
Value independence and the ability to publish and speak honestly about risks.
Deep care about AI safety and impacting how advanced AI systems are deployed.
A “get shit done” attitude and willingness to do whatever it takes to change the world.
Thrive in fast-moving, ambiguous environments with shifting threat models and defenses.
Personally developed universal or near-universal jailbreaks against at least one leading frontier model.
Demonstrated ability to discover non-obvious, high-severity vulnerabilities in frontier AI systems, complex software systems, or other hardened adversarial targets.
Deep, hands-on jailbreaking experience with demonstrated success against modern frontier models with layered defenses, including chaining multiple attack techniques through defense-in-depth stacks.
Experience with black-box optimisation methods, multimodal attacks, and/or agentic red-teaming.
Deep understanding of large language model architectures, training processes, and failure modes, including how these factors influence model behavior under adversarial conditions.
Strong existing track record in AI, adversarial ML, security, or another highly technical subject (e.g. CS, cybersecurity, math, physics).
Thrived in rapidly evolving environments where techniques go obsolete fast and you have to invent your way forward.
Invented novel attack classes.
Demonstrated drive for mission/impact and desire to create real impact on frontier AI systems.
Demonstrated relentlessness in achieving ambitious goals.

Nice To Haves

Prior collaboration with AI labs, security teams, or government safety institutes.
A track record in top CTF teams, offensive security research, or adversarial ML research.
Published work in AI safety, security, or robustness.
Can communicate technical findings and recommended mitigations to both technical and non-technical audiences, including frontier lab safety teams and senior policymakers.
Prior experience mentoring technical ICs or leading a small technical team (required only for the management track).

Responsibilities

Lead jailbreaking on the highest-stakes engagements, personally developing universal jailbreaks against frontier models in CBRNE, cyber, agentic security, extreme persuasion, and emerging risk domains.
Systematically dismantle defense-in-depth stacks (input filters, model-level refusal, reasoning monitors, output filters, account-level moderation), chaining novel and established techniques.
Escalate initial vulnerabilities to expose their most severe form, maximising universality, success rate, and capability of elicited output.
Own the technical bar for vulnerability severity and generality on every major engagement.
Invent new attack classes when existing techniques fail.
Monitor and rapidly incorporate state-of-the-art methods from the literature, and build a proprietary portfolio of techniques.
Shape the jailbreaking research agenda in partnership with Kellin, ensuring the toolkit stays ahead as defenses evolve.
Stress-test novel affordances (agents, tool use, long context, multimodal, reasoning, etc.) as frontier systems evolve.
Set the standard for rigor, creativity, and precision in jailbreaking across the red team.
Mentor ICs on attack craft, running pairing sessions, post-engagement retros, and internal writeups.
Review major red-teaming deliverables for technical quality, severity judgment, and clarity.
If on the management track: hire, manage, and grow a jailbreaking team without sacrificing personal technical edge.
Work directly with frontier labs and government agencies so that findings lead to real mitigations.
Contribute to public reports, benchmarks, and the FAR.AI safety leaderboard that shape industry norms.
Make precise, calibrated technical judgments about what is universal, what is reliable, and what a capable threat actor could actually do with a finding.

Benefits

USD 170,000–250,000 compensation, depending on experience. Exceptional candidates may be offered more.
Sponsorship for US or Singapore visas.
Opportunity to work with leading AI companies, governments, and academics.
Independence and ability to publish and speak honestly about risks.
Opportunity to make a real impact on frontier AI systems and AI safety.

Stand Out From the Crowd

Upload your resume and get instant feedback on how well it matches this job.

Upload and Match Resume