MeMo: Memory as a Model¶
Source: arXiv:2605.15156 \ Authors: Ryan Wei Heng Quek, Sanghyuk Lee, Alfred Wei Lun Leong, Arun Verma, et al. (NUS, A*STAR, MIT) \ Date: May 2026
TL;DR¶
MeMo introduces a framework that augments any frozen LLM with domain-specific or up-to-date knowledge via a trained memory model — a small language model that internalises knowledge from a target corpus. The frozen "executive" LLM queries the memory model through a structured multi-turn protocol, avoiding costly retraining, catastrophic forgetting, retrieval noise, and context window limits, while working with black-box proprietary APIs.
The Problem¶
LLMs have static knowledge cutoffs. Existing solutions all have critical trade-offs:
| Approach | Issue |
|---|---|
| RAG / ICL | Constrained by context windows, sensitive to retrieval noise, poor cross-document synthesis |
| Fine-tuning (CPT/SFT) | Catastrophic forgetting, expensive, incompatible with black-box APIs |
| Latent memory | Representation coupling ties memory to a specific model family |
The MeMo Framework¶
Three distinct models work together:
| Model | Role | Size |
|---|---|---|
| GENERATOR | Distills corpus into QA "reflections" during training | 32B (white-box) |
| MEMORY | Internalises corpus knowledge via SFT on reflections | 1.5B–14B |
| EXECUTIVE | Reasons over user query; queries MEMORY model | 32B+ (black-box OK) |
Key Advantages¶
- Cross-document relationships: Captures complex synthesis across documents.
- Robust to retrieval noise: No external retrieval index needed.
- No catastrophic forgetting: Executive model parameters stay unchanged.
- Black-box compatible: No weights, logits, or gradients required from the executive model — works with GPT-4, Claude, Gemini.
- Fixed inference cost: Retrieval cost is independent of corpus size.
Data Synthesis Pipeline¶
The key innovation is generating reflections — compositional QA pairs that expose underlying corpus knowledge. Five steps:
- Fact extraction — direct and inferred facts from each document chunk.
- Consolidation — merge related pairs into multi-fact questions.
- Verification & rewriting — discard non-self-contained pairs.
- Entity surfacing — generate QA requiring entity inference from attributes.
- Cross-document synthesis — identify converging clues and parallel properties across documents.
Ablation finding: Removing Step 5 collapses accuracy (6.37% vs 24.00% on NarrativeQA).
Continual Knowledge Integration¶
When new corpora arrive, MeMo uses model merging (TIES, DARE, SLERP) to combine separately trained memory models — scaling O(K) instead of O(K²) for full retraining. At K=10 corpora, this is ~5.5× savings (240 vs 1,320 GPU-hours).
Inference Protocol¶
- Grounding stage: Executive decomposes user query into atomic sub-questions.
- Entity identification: Iteratively narrows candidate entities via multi-turn sub-queries to the memory model.