Agentopia: Long-Term Life Simulation¶

Paper: arXiv · Authors: Neph0s et al. · Institution: (multiple)

Problem & Motivation¶

Existing multi-agent simulations collapse rapidly — agents forget context, lose coherence, and fail to maintain consistent long-term social dynamics. Prior work rarely exceeds hundreds of interaction steps. There is no established framework for running LLM-driven agents over year-long simulated timelines.

Method / Approach¶

Agentopia introduces a structured weekly simulation cycle with four phases: Plan → Contact & Scheduling → Activity → Review. Each agent maintains a persistent memory files system consisting of general.txt, characters/, and others/ directories. Context management combines a roleplay prompt, stage prompt, and message history. The environment itself is modeled as a stateless LLM that provides feedback, orchestrates emergent events, and enforces 16 roleplay principles verified each cycle. An economy system tracks income, spending, and living standards. Life Reward is measured across three dimensions: Social (Weighted PageRank on Affection/Respect graphs), Subjective, and Economy. The framework uses rejection-sampling training on high-advantage trajectories to improve agent behavior.

Key Results¶

+15.6% on CoSER Test overall
Anthropomorphism: +23.7%
Character Fidelity: +16.4%
Agents maintain coherent life narratives across 10 simulated years
Emergent social behaviors (friendships, rivalries, economic stratification) arise organically

Contributions¶

First framework for decade-scale LLM-agent life simulation
Structured weekly cycle with principled context management
Three-dimensional Life Reward metric (Social, Subjective, Economy)
Environment-as-LLM model for world feedback
Rejection-sampling training pipeline for long-horizon agent improvement

Strengths¶

Impressive scale — 100 agents × 3 worlds × 10 simulated years
Principled memory and context management design
Multi-dimensional evaluation captures both objective and subjective agent outcomes
Emergent social phenomena validate the simulation's fidelity

Weaknesses / Limitations¶

LLM costs scale linearly with agents and simulation duration
Stateless environment model may miss nuanced physical world interactions
Rejection-sampling training requires high-quality trajectory filtering
Limited analysis of failure modes or agent collapse scenarios

Connections & Follow-ups¶

Builds on the generative agent line of work (Park et al., 2023; AgentSims, 2024). The rejection-sampling training approach connects to RL from AI feedback (RLAIF) and self-play methods. The Life Reward framework offers a template for evaluating long-horizon agent behavior beyond simple task completion.

My Take¶

A solid engineering contribution that pushes multi-agent simulation to unprecedented temporal scales. The memory system and Life Reward framework are particularly well-designed. The real test will be whether these simulations produce genuinely novel social science insights rather than just reflecting the biases baked into their LLM backends.