Skip to content

If LLMs Have Human-Like Attributes, Then So Does Age of Empires II

Source: arXiv:2605.31514 \ Author: Adrian de Wynter (Microsoft & The University of York) \ Published: May 2026 \ License: CC BY-NC-SA 4.0


TL;DR

This paper doesn't argue for or against LLMs having human-like attributes — it argues that the experimental framework used to study them is fundamentally broken. Whether you accept or reject the existence of anthropomorphic attributes, the conclusions are either circular or uninformative. To prove the point empirically, the author builds a Turing-complete perceptron inside Age of Empires II, demonstrating that computational substrates alone can generate behaviours that would qualify as "human-like" under current criteria — and that how human-like something appears depends more on its implementation and interface than on any intrinsic property.

The Core Argument: LLMs Are "Non-Unique"

The paper's central claim is that LLMs are non-unique — their perceived anthropomorphic qualities are highly dependent on their substrate (implementation and interface). Change the representation, and the interpretation of the same underlying computational process changes too.

"Any sufficiently powerful substrate could implement an entity equivalent to an LLM… said implementation alters the representation… and thus could affect its perceived properties."

This is not a philosophical thought experiment — it's grounded in an actual, working demonstration.

Why Empiricism Over Thought Experiments?

The author notes that philosophical objections to anthropomorphism claims (like Searle's Chinese Room) rely on pure thought experiments. This paper provides an actual, empirical demonstration to ground the argument:

"By means of actual, empirical demonstration, it will become clearer that such properties are easier to observe and critically evaluate when the system's Geist is the same, but not its implementation."

The Age of Empires II Implementation

Mathematical Foundation

  • Lemma 1: NAND gates can be built in AoE II's scenario editor.
  • Theorem 1: AoE II is functionally complete (NAND gates form a functionally complete set).
  • Corollary 1: AoE II is Turing-complete (by bijection to a (5,5)-UTM from Rogozhin 1996).

The Perceptron Construction

The author builds and trains a 1-bit bipolar perceptron inside AoE II to learn the AND function:

  • Architecture: Two parallel XNOR gates mapped to an AND gate.
  • Training: Uses an ansatz-based algorithm where a circuit computes the error (ε = XOR(f(x), t)) and returns updated weights.
  • Outcome: The AoE II perceptron learns AND via weight updates.

The Implication

If an LLM were copied into AoE II, the prompt-output mapping would remain intact, but its "de-anthropomorphic qualities" would vanish. Outputs would become less convincing — exposing how much of what we call "human-like" is a function of interface and presentation (latency, text coherence, formatting) rather than the entity itself.

The "Accept/Reject" Trap

The paper deconstructs the logic behind any experiment that assumes a framework (e.g., Computational Theory of Mind / CTM) to study anthropomorphism:

Stance Hypothesis Positive Outcome Negative Outcome
Accept CTM (attributes exist) Attributes exist Circular Uninformative
Accept CTM (attributes exist) Attributes don't exist Contradiction (excluded)
Reject CTM (attributes don't exist) Attributes exist Contradiction (excluded)
Reject CTM (attributes don't exist) Attributes don't exist Circular Uninformative

Positive Experiment (Supports Hypothesis) → Circular Argument

"Positive outcomes, based on a hypothesis itself based on assumptions assumed to be true, provide evidence that the hypothesis is true, and concluded that the assumptions were true. This is a circular argument."

Negative Experiment (Falsifies Hypothesis) → Uninformative Outcome

When the experiment fails, we cannot distinguish between (a) the hypothesis being false, or (b) the experiment being flawed — because the core assumption is what's being investigated, there's no independent verifier.

"One cannot determine the failure mode within an accept/reject assumption alone."

Verdict: Within an accept/reject framework, sound generalized conclusions about anthropomorphic attributes cannot be drawn.

The Proposed Solution: The "Null Assumption"

Instead of the Accept/Reject trap, the paper proposes stopping all assumptions about anthropomorphic attributes:

"Our 'null assumption' then is to stop using the accept/reject setup completely. Instead of assuming anything about the existence of anthropomorphic attributes in the system, one should perform measurements over implementation-defined behaviours without interpreting or concluding that these are evidence of their existence or non-existence."

Key Rule: Distinguish between an observation of a pattern and its ascription.

  • Observation (valid): An LLM produces natural-language explanations.
  • Ascription (invalid): This implies "understanding" or "self-awareness."

"The null assumption in this example would be to treat the explanation as behavioural… inputs, and outputs are just tokens without deeper symbolism."

This yields claims that are more specific, sound, and falsifiable — design choices can be tested on observable evidence rather than unprovable assumptions.

Key Takeaways

  1. LLMs are non-unique — their perceived humanness is substrate-dependent, not an intrinsic property. A Turing-complete implementation in AoE II demonstrates this empirically.
  2. The Accept/Reject framework is logically broken — positive results are circular, negative results are uninformative. Sound conclusions about anthropomorphism cannot be drawn within this setup.
  3. The "Null Assumption" — stop assuming anything about internal attributes. Measure behaviours, report them as observations, and resist the urge to ascribe humanness. Claims become falsifiable and scientifically sound.