Applied Research - Forward-Deployed

Prime Intellect•San Francisco, CA

3h•Hybrid

About The Position

We're looking for a Forward-Deployed Research Engineer (FDRE) to serve as the primary technical interface between Prime Intellect and our most important customers: AI companies, research labs, and enterprises running post-training and agentic RL on our platform. This is not a traditional research role. You'll spend most of your time embedded with customers, understanding their models, workflows, and goals. Then, you'll translate those objectives into concrete training runs, environment designs, evaluation harnesses, and deployment recipes using the Lab stack. You are the person who makes the platform work in practice for real workloads. You'll work closely with our research, product, and infrastructure teams to feed field insights back into the platform, shaping what we build next based on what customers actually need.

Requirements

Deep hands-on experience building, evaluating, or deploying LLM-based agents in the past 1–2 years — you've seen what breaks in production and know what good evals look like
Strong intuition for evaluation design: you can look at a customer's agent and quickly identify what to measure, how to construct a rubric, and where the reward signal is weak
Working understanding of RL and post-training concepts (GRPO, RLHF, reward modeling, SFT) — you don't need to have written a trainer from scratch, but you should understand what the knobs do and why they matter
Strong Python skills and comfort with the modern AI stack (Hugging Face, inference engines, agent frameworks)
Experience in a customer-facing or consulting-adjacent technical role, or as a technical founder — you're comfortable in a room with a customer's engineering team figuring out what to build
Excellent written and verbal communication — you can write a clear environment spec, a compelling case study, and a useful Slack message to a frustrated customer
High agency and comfort with ambiguity. You don't wait for specs; you scope the problem, ship a solution, and iterate

Nice To Haves

Experience with agent frameworks and tooling (DSPy, LangGraph, MCP, Stagehand, browser automation)
Experience building or running LLM evaluation pipelines at scale (benchmarks, synthetic data generation, model grading)
Research experience — publications, open-source contributions, or benchmarks in ML/RL/agents
Familiarity with sandbox/code execution environments for agent evaluation
Web programming experience (React, TypeScript, Next.js) for building demos and customer-facing tooling

Responsibilities

Embed directly with strategic customers to understand their agent architectures, failure modes, and product goals
Design and build custom RL environments, evaluation harnesses, and verifiers that capture what "good" looks like for each customer's domain
Architect agent scaffolding — tool use, multi-step reasoning, memory, sandbox execution — tailored to customer workflows
Configure and launch training runs on Lab, iterating on reward functions, rollout strategies, and evaluation criteria
Serve as the technical lead for engagements end-to-end: from discovery through deployed, improved models
Identify repeatable patterns from customer engagements and codify them into reference implementations, templates, and documentation
Serve as the voice of the customer internally, shaping the roadmap for Lab, verifiers, the Environments Hub, and training infrastructure
Build high-quality examples and "recipes" that make it easy for new customers and open-source contributors to extend the stack
Contribute to technical content (blog posts, tutorials, case studies) that demonstrates real-world platform usage
Develop novel evaluation methodologies for agentic behavior — multi-step reasoning, tool use correctness, recovery from failure, long-horizon task completion
Prototype and iterate on agent harnesses for real-world tasks: code generation, workflow automation, document processing, and more
Experiment with reward design, rubric construction, and environment shaping to improve training signal quality
Stay current on the frontier of agentic AI, evals, and post-training methods, and bring that knowledge directly into customer work