Skip to content

MeMo: Memory as a Model

Source: arXiv:2605.15156 \ Authors: Ryan Wei Heng Quek, Sanghyuk Lee, Alfred Wei Lun Leong, Arun Verma, et al. (NUS, A*STAR, MIT) \ Date: May 2026


TL;DR

MeMo introduces a framework that augments any frozen LLM with domain-specific or up-to-date knowledge via a trained memory model — a small language model that internalises knowledge from a target corpus. The frozen "executive" LLM queries the memory model through a structured multi-turn protocol, avoiding costly retraining, catastrophic forgetting, retrieval noise, and context window limits, while working with black-box proprietary APIs.

The Problem

LLMs have static knowledge cutoffs. Existing solutions all have critical trade-offs:

Approach Issue
RAG / ICL Constrained by context windows, sensitive to retrieval noise, poor cross-document synthesis
Fine-tuning (CPT/SFT) Catastrophic forgetting, expensive, incompatible with black-box APIs
Latent memory Representation coupling ties memory to a specific model family

The MeMo Framework

Three distinct models work together:

Model Role Size
GENERATOR Distills corpus into QA "reflections" during training 32B (white-box)
MEMORY Internalises corpus knowledge via SFT on reflections 1.5B–14B
EXECUTIVE Reasons over user query; queries MEMORY model 32B+ (black-box OK)

Key Advantages

  • Cross-document relationships: Captures complex synthesis across documents.
  • Robust to retrieval noise: No external retrieval index needed.
  • No catastrophic forgetting: Executive model parameters stay unchanged.
  • Black-box compatible: No weights, logits, or gradients required from the executive model — works with GPT-4, Claude, Gemini.
  • Fixed inference cost: Retrieval cost is independent of corpus size.

Data Synthesis Pipeline

The key innovation is generating reflections — compositional QA pairs that expose underlying corpus knowledge. Five steps:

  1. Fact extraction — direct and inferred facts from each document chunk.
  2. Consolidation — merge related pairs into multi-fact questions.
  3. Verification & rewriting — discard non-self-contained pairs.
  4. Entity surfacing — generate QA requiring entity inference from attributes.
  5. Cross-document synthesis — identify converging clues and parallel properties across documents.

Ablation finding: Removing Step 5 collapses accuracy (6.37% vs 24.00% on NarrativeQA).

Continual Knowledge Integration

When new corpora arrive, MeMo uses model merging (TIES, DARE, SLERP) to combine separately trained memory models — scaling O(K) instead of O(K²) for full retraining. At K=10 corpora, this is ~5.5× savings (240 vs 1,320 GPU-hours).

Inference Protocol

  1. Grounding stage: Executive decomposes user query into atomic sub-questions.
  2. Entity identification: Iteratively narrows candidate entities via multi-turn sub-queries to the memory model.