Transformers Are Inherently Succinct¶

Source: Transformers Are Inherently Succinct

TL;DR¶

A new ICLR 2026 paper by Pascal Bergsträßer, Ryan Cotterell, and Anthony W. Lin proves that fixed-precision transformers are exponentially more succinct than both linear temporal logic (LTL) and recurrent neural networks (and by extension state-space models), and doubly exponentially more succinct than finite automata. This succinctness means that basic verification problems (emptiness, equivalence) for UHATs (Uniform Hard Attention Transformers) are EXPSPACE-complete — provably intractable under standard complexity assumptions. The key technical insight is that transformers can implement doubly exponentially large counters (0 to 2²N) via a subtle encoding using attention. Translation bounds show UHAT to LTL now has a single exponential blow-up (improved from the prior double exponential bound). A direct consequence: analyzing transformers is computationally very challenging.

The Succinctness Result¶

The paper establishes that fixed-precision transformers (specifically the UHAT — Uniform Hard Attention Transformer — model) are dramatically more succinct than classical computational models:

Exponentially more succinct than Linear Temporal Logic (LTL) — translating a transformer to an equivalent LTL formula can incur an exponential blow-up in size
Exponentially more succinct than recurrent neural networks (RNNs), and by extension state-space models (SSMs)
Doubly exponentially more succinct than finite automata (DFAs/NFAs)

In practical terms: a transformer that can be described in a few hundred parameters might require a finite automaton with more states than atoms in the observable universe to simulate exactly.

How Transformers Achieve This Succinctness¶

The key technical insight involves how transformers manipulate numerical values through attention mechanisms. The authors show that fixed-precision transformers (where each numerical value is stored with a fixed number of bits) can implement counters that range from 0 to 2²N — a doubly exponential range — using only O(N) bits of internal state. They achieve this through a subtle encoding trick:

The attention mechanism can be used to implement iterative doubling
By nesting this doubling within itself, the model achieves a tower-of-exponents effect
The fixed-precision constraint (typically 16-32 bits) imposes a ceiling, but within that ceiling the encoding is remarkably efficient

Verification Is Provably Hard¶

A central consequence: verification problems for transformers are provably intractable:

Emptiness problem (does a given transformer accept any input?): EXPSPACE-complete
Equivalence problem (do two transformers recognize the same language?): EXPSPACE-complete

EXPSPACE is the class of problems solvable using exponential space (memory). Under standard complexity assumptions (PSPACE ≠ EXPSPACE), these problems require more than polynomial or even exponential time to solve. This means that fully automated verification of transformer behavior is fundamentally intractable — even in theory, not just in practice.

Improved Translation Bounds¶

The paper also improves the known translation bounds between transformers and LTL:

Prior work showed a double exponential blow-up when translating UHAT to LTL
This paper improves that to a single exponential blow-up, matching the lower bound

This means the translation is now tight — the exponential blow-up is unavoidable.

Implications for AI Safety and Interpretability¶

The results have sobering implications for the field of AI safety:

Any exact verification of transformer behavior is provably intractable
This is not a limitation of current tools or techniques, but a fundamental property of the computational model
Approximate or statistical verification methods (testing, monitoring, behavioral bounds) are not just practical compromises but theoretically necessary
The succinctness that makes transformers powerful (compressing enormous computational structures into compact parameter sets) is the same property that makes them resistant to analysis

The Bigger Picture¶

The paper fits into a broader research program understanding transformers through the lens of computational complexity theory. Previous work has shown that transformers are computationally powerful (Turing-complete with chain-of-thought, TC0 without). This paper adds: transformers are also exponentially succinct — able to express computations in a fraction of the description size required by classical models. This succinctness is a double-edged sword: it enables transformers to learn complex patterns from limited data, but it also means we can never fully verify what they've learned.

Key Takeaways¶

Fixed-precision transformers are exponentially more succinct than LTL and RNNs, and doubly exponentially more succinct than finite automata
Key technique: transformers implement doubly exponential counters (0 to 2²N) via attention-based encoding
Verification problems (emptiness, equivalence) for UHATs are EXPSPACE-complete — provably intractable
Translation from UHAT to LTL now has a tight single exponential bound
Exact verification of transformer behavior is fundamentally impossible in the general case, making approximate methods theoretically necessary