Member of Technical Staff, Research Engineer

Plato•San Francisco, CA

About The Position

Plato is an applied research lab focused on building the foundational infrastructure for training specialized AI agents. The company's mission is to transform real-world data streams into high-fidelity simulated environments that generate the necessary training signals for capable AI models. This work is crucial for frontier labs, hyperscalers, and enterprises developing AI systems for complex and high-stakes applications. While compute and algorithms are becoming commoditized, the availability of reinforcement learning data remains a significant bottleneck. Plato aims to address this by automating the scaling of training environments using proprietary real-world data.

Requirements

Strong implementation ability and can turn ambiguous research ideas into working systems.
Experience with RL, LLM agents, computer-use agents, evals, post-training, synthetic data, simulation, or model behavior analysis.
Care deeply about whether a task is grounded, difficult, reward-hack-resistant, and capable of producing actual learning signal.
Comfortable interpreting ambiguous model behavior and negative results.
Enjoy building continuous research loops rather than static benchmark artifacts.

Responsibilities

Design experiments.
Build task generation systems.
Run evaluations.
Inspect model failures.
Develop methods for mining tasks that are just out of reach of today's agents.
Discover model failure modes from real-world traces, agent telemetry, targeted researcher hypotheses, and customer workflows.
Generate realistic curricula grounded in actual workflows rather than toy synthetic benchmarks.
Benchmark candidate tasks against frontier CUA and agent models using pass rates, rollouts, and behavioral traces as difficulty signals.
Build hill-climbing loops that mutate, filter, and rescore tasks until they surface high-signal targets.
Study reward hackability, distribution mismatch, task realism, long-horizon failures, and transfer from simulation to deployed agents.
Turn research prototypes into reliable internal systems for continuous curriculum generation.