Research Engineer, Agentic Retrieval (North America)

Qdrant•San Francisco, CA

2d•Remote

About The Position

Qdrant is an open-source vector search engine powering the next generation of AI applications, from semantic search and retrieval-augmented generation (RAG) to AI agents and real-time recommendations. As a remote-first company, we believe diverse backgrounds, perspectives, and experiences fuel innovation. Here, you’ll own meaningful work, tackle challenges, and grow alongside passionate individuals dedicated to shaping the future of AI. We are looking for a Research Engineer, Agentic Retrieval. You'll work at the seam between agent systems research and retrieval engineering, running a tight loop between hypothesis, experiment, and shipped artifact. The questions you'll chase may not have settled answers yet: how agents should structure memory, when they should re-query versus reason, how skills and tools should be retrieved and composed, what retrieval primitives the agent loop actually needs, and what "good" even means when success is a multi-step trajectory rather than a ranked list. You'll go deep on how real agent stacks use Qdrant today, where the abstractions around them help or hurt, and what we should build (or change) so they can do more with less. The agent ecosystem moves fast, and part of the job is staying current with it without getting captured by it. You'll have a lot of latitude to choose what to investigate. The bar is the same either way: every cycle should produce something the field, our customers, or the rest of the company can act on.

Requirements

You read and reason about LLM behavior directly. You can distinguish prompt issues from planning issues from retrieval issues from tool design issues, and you've internalized how models actually use retrieved content versus ignore it.
You treat memory as a systems design problem. You distinguish episodic, semantic, and procedural memory, and you know naive "store every turn as a vector" approaches collapse fast.
You understand tool and skill systems as retrieval problems. You see tool selection and skill matching as ranking problems with their own quirks: tiny corpora, heavy metadata, strong priors, sensitivity to descriptions.
You have a working theory of context engineering. You think carefully about what goes into the context window and why, and you understand that retrieval quality and context construction are the same problem from two angles.
You build evals before features. You know how to construct task suites that actually discriminate between approaches, and how to avoid just on recall@k.
You know vector search internals at a decent level. HNSW tradeoffs, quantization, filtered search, multi-vector, hybrid retrieval, payload indexing. Enough to design agent patterns that exploit Qdrant's primitives instead of treating the database as a black box.
You write precisely. You can describe a memory architecture or failure mode in a way other engineers can implement from.

Nice To Haves

Contributions to agent stacks, skill systems, MCP servers, RAG tooling, or eval harnesses.
Experience designing agent benchmarks or running them at scale.
Familiarity with Qdrant or comparable vector search systems under production agent traffic.
Track record working with design-partner customers or open-source community contributors.

Responsibilities

Define what good agentic retrieval looks like. Characterize the retrieval patterns inside real agent loops, name the failure modes, and turn that vocabulary into something the team and the field can build against.
Treat agent memory as a systems problem. Episodic, semantic, and procedural memory each need different write paths, decay, and consolidation. Figure out which architectures hold up at scale and turn the durable patterns into reference implementations.
Investigate skill and tool retrieval as a first-class problem. How a skill registry should be indexed, how skills should be selected under tool budgets, and how retrieval should compose with planner decisions.
Design and run experiments on retrieval inside agent loops: query rewriting and decomposition, multi-hop retrieval, tool-conditioned filtering, retrieval-as-a-tool patterns, and the interplay between planner, retriever, and reranker.
Build evaluation infrastructure for agentic retrieval. Define metrics that correlate with end-to-end task success rather than recall@k, and build harnesses that catch regressions before they ship.
Profile agent retrieval traces end to end. Isolate where latency, cost, and quality losses come from across the fan-out of tool calls, and produce minimal reproductions when something looks like an engine-level issue.
Study how real agent stacks use Qdrant in production. Trace workloads, find where the surrounding abstractions leak performance or quality, and propose changes in Qdrant, in the stack, or in the recipe between them.
Pair with design-partner teams running serious agent workloads in production, and bring their real constraints back into research priorities.
Influence the roadmap. Translate evidence into product bets and argue for what should be a feature, a primitive, or a recipe.