MeMo: Memory as a Model¶

Source: arXiv:2605.15156 \ Authors: Ryan Wei Heng Quek, Sanghyuk Lee, Alfred Wei Lun Leong, Arun Verma, et al. (NUS, A*STAR, MIT) \ Date: May 2026

TL;DR¶

MeMo introduces a framework that augments any frozen LLM with domain-specific or up-to-date knowledge via a trained memory model — a small language model that internalises knowledge from a target corpus. The frozen "executive" LLM queries the memory model through a structured multi-turn protocol, avoiding costly retraining, catastrophic forgetting, retrieval noise, and context window limits, while working with black-box proprietary APIs.

The Problem¶

LLMs have static knowledge cutoffs. Existing solutions all have critical trade-offs:

Approach	Issue
RAG / ICL	Constrained by context windows, sensitive to retrieval noise, poor cross-document synthesis
Fine-tuning (CPT/SFT)	Catastrophic forgetting, expensive, incompatible with black-box APIs
Latent memory	Representation coupling ties memory to a specific model family

The MeMo Framework¶

Three distinct models work together:

Model	Role	Size
GENERATOR	Distills corpus into QA "reflections" during training	32B (white-box)
MEMORY	Internalises corpus knowledge via SFT on reflections	1.5B–14B
EXECUTIVE	Reasons over user query; queries MEMORY model	32B+ (black-box OK)

Key Advantages¶

Cross-document relationships: Captures complex synthesis across documents.
Robust to retrieval noise: No external retrieval index needed.
No catastrophic forgetting: Executive model parameters stay unchanged.
Black-box compatible: No weights, logits, or gradients required from the executive model — works with GPT-4, Claude, Gemini.
Fixed inference cost: Retrieval cost is independent of corpus size.

Data Synthesis Pipeline¶

The key innovation is generating reflections — compositional QA pairs that expose underlying corpus knowledge. Five steps:

Fact extraction — direct and inferred facts from each document chunk.
Consolidation — merge related pairs into multi-fact questions.
Verification & rewriting — discard non-self-contained pairs.
Entity surfacing — generate QA requiring entity inference from attributes.
Cross-document synthesis — identify converging clues and parallel properties across documents.

Ablation finding: Removing Step 5 collapses accuracy (6.37% vs 24.00% on NarrativeQA).

Continual Knowledge Integration¶

When new corpora arrive, MeMo uses model merging (TIES, DARE, SLERP) to combine separately trained memory models — scaling O(K) instead of O(K²) for full retraining. At K=10 corpora, this is ~5.5× savings (240 vs 1,320 GPU-hours).

Inference Protocol¶

Grounding stage: Executive decomposes user query into atomic sub-questions.
Entity identification: Iteratively narrows candidate entities via multi-turn sub-queries to the memory model.