Senior Applied ML Research Engineer, Agentic Security

Dynamo AI•San Francisco, CA

1d•Remote

About The Position

You'll help define an emerging area: how to find and neutralize the security risks that emerge when agents act, plan, and use tools autonomously. This role is research-heavy and engineering-heavy: you'll design experiments, build prototypes, fine-tune models, and pressure-test systems against adversarial behavior. You'll iterate quickly, learn from failures, and scale what works, while building the monitoring and evaluation infrastructure that makes progress measurable.

Requirements

MS or PhD in CS/ML (or equivalent research experience)
Experience fine-tuning and evaluating models in practice
Ability to reason about data quality, overfitting, evals, and deployment constraints
Ability to write strong production code
Comfortable owning the infrastructure that makes agentic evals run end-to-end
Care about reproducibility and instrumentation
Motivated by security problems and enjoy thinking like both builder and attacker
Reason about how capabilities combine into risk: not just individual vulnerabilities, but system-level attack surfaces across tool ecosystems
Ability to communicate clearly, iterate fast, and can hold a technical narrative from "hypothesis" to "shipped"

Nice To Haves

You enjoy working under uncertainty
No AI slop
You might thrive in this role if you have an MS or PhD in CS/ML (or equivalent research experience)

Responsibilities

Define and validate threat models for agentic systems, identifying which tool characteristics must co-exist to enable data exfiltration and malicious state change, and how to break those combinations
Design and run experiments: create synthetic environments like file systems and tools, create task distributions that have attack paths and apply different attack strategies
Break (manually and using optimization algorithms such as RL) agentic systems
Design and improve static and dynamic analysis methods that automatically map tool capabilities to risk across diverse tool ecosystems, and make those methods scale
Turn research insights into product-facing capabilities: risk classification, automated guardrail generation, and quantitative threat measurement
Build measurement tools: eval harnesses, monitoring, dashboards, and feedback loops that quantify security outcomes
Build capability and regression evals
Optimize systems for real-world constraints (latency, cost, reliability) without losing scientific rigor