AI Research Engineer

Dropzone AI

1d•Remote

About The Position

We are seeking a Senior to Principal-level AI Research Engineer to lead the design and development of next-generation agentic AI systems. This role sits at the intersection of research and production, with a strong emphasis on agent architecture design, harness and memory engineering, and robust evaluation and benchmarking of model and agent performance. You will work closely with product and engineering teams to translate cutting-edge research into scalable, real-world systems. In this role, you will directly shape the core intelligence layer of Dropzone AI. Your work will define how our agents reason, remember, and improve over time, influencing both our product capabilities and the broader direction of applied AI systems.

Requirements

5+ years in software engineering, with at least 1+ year applying GenAI in production
Proven experience building or researching: Agent frameworks / tool-using LLMs
Proven experience building or researching: Memory / retrieval systems (RAG, vector DBs, hybrid retrieval)
Expert Python developer
Familiar with openclaw and Claude Code harness architecture
Early-stage startup mindset. You thrive on ambiguity and move with lightspeed execution

Nice To Haves

Experience with agent orchestration frameworks (LangGraph, AutoGen, custom systems)
Familiarity with AI safety guardrails, hallucination mitigation, and structured output enforcement
Experience designing LLM evals (offline + online, human-in-the-loop, synthetic data)
Publications or open-source contributions in relevant areas
Experience applying latest context/harness engineering techniques to customer facing products
Founder or early-stage (first 10 engineers) or experience in standing up a new technology bet within a more established company

Responsibilities

Design and implement advanced multi-step reasoning agents (tool use, planning, reflection, self-improvement loops)
Develop frameworks for multi-agent coordination and task decomposition
Improve reliability, latency, and cost efficiency of agent execution
Architect short-term and long-term memory subsystems (episodic, semantic, retrieval-based, hybrid)
Build mechanisms for context compression, retrieval, and grounding
Explore novel approaches to continual learning and state persistence
Define and implement evaluation frameworks for agent performance (task success, reasoning quality, robustness)
Build automated eval pipelines (synthetic data, adversarial testing, regression testing)
Establish metrics and benchmarks for agent reliability in production
Translate latest community research ideas into production-grade systems
Run experiments, analyze results, and iterate quickly
Contribute to internal knowledge sharing and technical direction