Let's Talk About Encrypted Reasoning¶
Source: Let's Talk About Encrypted Reasoning
TL;DR¶
Matthew Green investigates encrypted "reasoning/thinking" blobs in Frontier LLM APIs from OpenAI and Anthropic. These encrypted blobs ship the model's raw chain-of-thought reasoning to the client as authenticated ciphertext (Base64-encoded) to support stateless, zero-retention conversations without letting the client read internal monologue. Key findings include: (1) replay attacks work across sessions, different accounts, and even different models — implying a single global key, meaning everyone's reasoning data is escrowed under one key, and replayed blobs remain semantically active; (2) side-channel attacks via encrypted blob size, reasoning_tokens field, and wall-clock response time leak information — the researcher extracted the byte 0xA3 bit-by-bit via 80 trials. Attempts to extract system prompts failed because models don't actually have system prompts in API mode and hallucinate plausible ones. Both OpenAI and Anthropic downplayed the findings.
The Encrypted Blob Mechanism¶
Frontier model APIs have begun shipping encrypted blobs alongside their responses — Base64-encoded authenticated ciphertext that contains the model's raw chain-of-thought reasoning. This design serves a specific purpose: enabling stateless, zero-retention conversations where the server doesn't need to store the internal monologue between turns, while still preventing the client from reading the model's "private" reasoning process.
Green's investigation reveals that these blobs use what appears to be a single global key shared across all users, sessions, and even models within each provider's ecosystem.
Replay Attack Findings¶
The most alarming discovery is that replay attacks are trivially possible:
- A reasoning blob captured from one session can be replayed in a different session
- Blobs work across different user accounts entirely
- Most strikingly, blobs can be replayed across different models (e.g., a blob from a weaker model works on a stronger one)
- Replayed blobs remain semantically active — in a demo, Green replayed a blob containing an SSN and the model dutifully output that SSN in the new context
This single global key design means that anyone's reasoning data is effectively escrowed under a key that, if compromised, would expose all past and future reasoning traces.
Side-Channel Leakage¶
Even without breaking the encryption, significant information leaks through side channels:
- Blob size: The encrypted blob's byte length correlates with the length of the internal reasoning trace
- reasoning_tokens field: The metadata field explicitly reports how many tokens were spent in reasoning
- Wall-clock timing: Response latency reveals when the model spent more or less time reasoning
Green demonstrated the practical exploitation of these side channels by extracting the byte 0xA3 bit-by-bit across 80 trials, proving that private reasoning content can be inferred without ever decrypting the blob.
The System Prompt Puzzle¶
An interesting negative result: attempts to extract system prompts from the reasoning blobs failed. The models, when asked about their system prompts in API mode, hallucinated plausible-sounding but fictional prompts. This suggests that the API models don't actually carry system prompts in the same way chat-interface versions do — the "system prompt" concept may be an artifact of the chat interface that doesn't carry over to programmatic API usage.
Provider Responses¶
Both OpenAI and Anthropic downplayed Green's findings. Their responses characterized the replay attack as a feature for maintaining conversation state rather than a security vulnerability, and treated the side-channel leakage as inherent to any encrypted communication system rather than a solvable design flaw.
Key Takeaways¶
- Encrypted reasoning blobs use a single global key, making replay attacks trivially possible across sessions, accounts, and models
- Side-channel attacks (blob size, token count, timing) leak reasoning content without breaking encryption
- The design prioritizes server convenience (statelessness) over user privacy and security
- The cryptographic community considers this a fundamentally broken approach to protecting reasoning data
- A more secure design would use per-session or per-user keys, bind blobs to specific contexts, and pad outputs to prevent side-channel leakage