Applied Reinforcement Learning Engineer 2

Centific•Redmond, WA

1d•Hybrid

About The Position

Centific AI Research advances foundational AI models and applications through reinforcement learning, alignment, and human-centered intelligence. Our mission is to transform data, signals, and human insight into next-generation intelligent systems that redefine enterprise intelligence. We're building a governed RL environment platform that enables enterprises to safely iterate and improve AI agent workflows through simulation-based learning, bridging human-labeled signal creation with automated RL training for high-stakes operations. As an Applied RL Engineer, you will design and build RL environments that simulate complex enterprise workflows and train intelligent agents within them. You'll work at the intersection of RL research and production systems, translating customer requirements into bespoke simulation environments and post-training pipelines that deliver measurable improvements to AI agent performance. This role requires deep expertise in both classical RL methodologies and modern LLM-based agent architectures. You'll shape our product direction and help make RL accessible to enterprise customers who need safe, compliant ways to improve their AI systems.

Requirements

Deep RL expertise: 3+ years hands-on experience with environment design, reward engineering, policy optimization
LLM post-training: Experience fine-tuning LLMs using RLHF, DPO, PPO, or similar
Production skills: Software engineering beyond research with scalable pipelines and training infrastructure
Agentic AI: Experience with LLM-based agents, tool use, multi-step reasoning
Technical stack: Strong Python; Gymnasium, RLlib, Stable Baselines; PyTorch/JAX/TensorFlow
Education: MS/PhD in CS, ML, or related field (or equivalent experience)

Nice To Haves

Publications at NeurIPS, ICML, ICLR, ACL, or similar venues
Enterprise workflow experience in healthcare, finance, logistics, or compliance
Open-source contributions to CleanRL, TRL, veRL, or agent frameworks
Experience with world models, synthetic data generation, and simulation
Distributed training and large-scale RL experimentation

Responsibilities

Design and build custom RL environments (digital twins) simulating enterprise workflows: document processing, compliance, onboarding, support automation
Post-train LLM-based agents on domain-specific tasks using PPO, GRPO, DPO, and RLHF
Build end-to-end pipelines converting human-labeled traces into RL training data
Architect multi-step reasoning agents with tool-calling and closed learning loops
Design reward functions, verifiers, and validation frameworks for pre-deployment testing
Translate cutting-edge RL research into production systems; contribute to publications

Benefits

Lead the frontier: Shape a new discipline at the intersection of RL, simulation, and enterprise AI
Ship your science: See your research power real systems across healthcare, finance, and safety
Collaborate with leaders: Work alongside NVIDIA, Microsoft, and the global AI community
Build what matters: Create governed, compliant AI systems enterprises can trust.

Stand Out From the Crowd

Upload your resume and get instant feedback on how well it matches this job.

Upload and Match Resume