Deep Research is now Open — Agent-ModernColBERT¶
Source: LightOn Blog
Date: 2026-05-12
TL;DR¶
Agent-ModernColBERT is a 149M-parameter late-interaction retriever that adds ~10% accuracy over Reason-ModernColBERT on BrowseComp-Plus by treating agent reasoning traces as first-class retrieval signals. Paired with GPT-OSS-120B, it matches the original GPT-5 + Qwen3-Embed-8B stack while being 54× smaller than dense alternatives.
Core Idea: Don't Throw Away the Reasoning¶
In agentic search workflows, the agent typically decomposes the task, forms hypotheses, rules out dead ends, and decides what evidence it needs next. Standard pipelines discard this reasoning and search only on a rewritten keyword query.
AgentIR concatenates the agent's full reasoning trace directly into the retrieval query. When applied to late-interaction models, this yields a ~10% accuracy boost over already state-of-the-art Reason-ModernColBERT.
The Agent-ModernColBERT Stack¶
| Attribute | Detail |
|---|---|
| Parameters | 149M |
| Architecture | Late-interaction retriever (ModernColBERT) |
| Training data | DR-Synth (synthetic agent trajectories) |
| Training time | ~5 minutes |
| Paired LLM | GPT-OSS-120B |
| BrowseComp-Plus accuracy | 72.53% |
- This exceeds the original GPT-5 + Qwen3-Embed-8B baseline from BrowseComp-Plus.
- Competitive with AgentIR-4B, a dense model 26× larger trained on identical data and prompts.
Why Late Interaction Wins for Agentic Queries¶
A standard query is short — keywords, an entity, a question. An agentic query is rich, containing hypotheses, intermediate reasoning, constraints, and descriptions of missing evidence. Compressing all of that into a single vector loses signal.
Late-interaction models keep token-level representations and compare query tokens against document tokens at retrieval time. When the query contains a reasoning trace, this is especially powerful: different parts of the trace can match different pieces of evidence in the document.
Why It Matters¶
- Efficiency: 149M parameters, 5-minute training, yet competitive with multi-billion-parameter dense retrievers.
- Open stack: Fully open-source (GPT-OSS-120B + Agent-ModernColBERT) matching closed GPT-5-level retrieval performance.
- Architectural insight: The performance gap in agentic retrieval is increasingly driven by how much query signal you preserve, not how many parameters you scale.