Reinforcement Learning Infrastructure (Cybersecurity)

Bugcrowd

2d•$176,400 - $242,550•Remote

About The Position

The Bugcrowd RL and Reasoning Team focuses on pushing the boundaries of autonomous cybersecurity by building authentic reinforcement learning environments for foundational model companies. As a Staff Engineer, you will advance the frontier of AI Reinforcement Learning development and delivery. You will build the infrastructure and tooling that transforms real-world vulnerability research into large-scale reinforcement learning environments used to train next-generation AI systems. This role is unique. You will help create the training environments that teach AI systems how to hack and defend software. Your work will directly influence the capabilities of the next generation of AI models. Instead of building a single application, you will build the infrastructure that generates thousands of environments used to train frontier AI systems. Our team works at the intersection of AI, security research, and systems engineering, building environments that allow models to learn skills such as vulnerability discovery, exploitation, and remediation.

Requirements

Strong systems engineering background.
Understanding of Reinforcement Learning workflows.
Experience building clean, reproducible Linux ML environments (containers, MCP, etc).
System security background in binary exploitation, such as buffer overflows, fuzzing, exploitation, and x86/64.
Experience developing applications in Python and C.
Understanding of RL training workflows used by modern LLM systems.
Experience with DevOps pipelines (e.g., GitHub Actions), reproducible builds (Docker, BuildKit, Nix).
Proficiency in Python and C.
Understanding of software vulnerabilities, fuzzing, or program analysis.
Experience with build systems and large open-source codebases.
Comfort working with Linux systems and low-level debugging.

Nice To Haves

Rust a plus.
Other languages (especially Rust) are a plus.
Experience working with benchmark environments (CTFs, SWE-bench, security challenges, etc.) is a plus.

Responsibilities

Design pipelines that ingest software projects, analyze them with Bugcrowd’s Mayhem platform, and automatically construct training environments used by frontier AI labs including Anthropic, OpenAI, and Cohere.
Build the systems that generate RL environments, not just the environments themselves.
Advance the frontier of AI Reinforcement Learning development and delivery.
Build the infrastructure and tooling that transforms real-world vulnerability research into large-scale reinforcement learning environments used to train next-generation AI systems.
Create the training environments that teach AI systems how to hack and defend software.
Build the infrastructure that generates thousands of environments used to train frontier AI systems.
Build environments that allow models to learn skills such as vulnerability discovery, exploitation, and remediation.

Benefits

Discretionary bonus program or commission plan, subject to the rules governing the program, whereby an award, if any, depends on various factors, including, without limitation, individual and organizational performance.

Stand Out From the Crowd

Upload your resume and get instant feedback on how well it matches this job.

Upload and Match Resume