Skip to content

Agentopia: Long-Term Life Simulation

Paper: arXiv · Authors: Neph0s et al. · Institution: (multiple)

Problem & Motivation

Existing multi-agent simulations collapse rapidly — agents forget context, lose coherence, and fail to maintain consistent long-term social dynamics. Prior work rarely exceeds hundreds of interaction steps. There is no established framework for running LLM-driven agents over year-long simulated timelines.

Method / Approach

Agentopia introduces a structured weekly simulation cycle with four phases: PlanContact & SchedulingActivityReview. Each agent maintains a persistent memory files system consisting of general.txt, characters/, and others/ directories. Context management combines a roleplay prompt, stage prompt, and message history. The environment itself is modeled as a stateless LLM that provides feedback, orchestrates emergent events, and enforces 16 roleplay principles verified each cycle. An economy system tracks income, spending, and living standards. Life Reward is measured across three dimensions: Social (Weighted PageRank on Affection/Respect graphs), Subjective, and Economy. The framework uses rejection-sampling training on high-advantage trajectories to improve agent behavior.

Key Results

  • +15.6% on CoSER Test overall
  • Anthropomorphism: +23.7%
  • Character Fidelity: +16.4%
  • Agents maintain coherent life narratives across 10 simulated years
  • Emergent social behaviors (friendships, rivalries, economic stratification) arise organically

Contributions

  1. First framework for decade-scale LLM-agent life simulation
  2. Structured weekly cycle with principled context management
  3. Three-dimensional Life Reward metric (Social, Subjective, Economy)
  4. Environment-as-LLM model for world feedback
  5. Rejection-sampling training pipeline for long-horizon agent improvement

Strengths

  • Impressive scale — 100 agents × 3 worlds × 10 simulated years
  • Principled memory and context management design
  • Multi-dimensional evaluation captures both objective and subjective agent outcomes
  • Emergent social phenomena validate the simulation's fidelity

Weaknesses / Limitations

  • LLM costs scale linearly with agents and simulation duration
  • Stateless environment model may miss nuanced physical world interactions
  • Rejection-sampling training requires high-quality trajectory filtering
  • Limited analysis of failure modes or agent collapse scenarios

Connections & Follow-ups

Builds on the generative agent line of work (Park et al., 2023; AgentSims, 2024). The rejection-sampling training approach connects to RL from AI feedback (RLAIF) and self-play methods. The Life Reward framework offers a template for evaluating long-horizon agent behavior beyond simple task completion.

My Take

A solid engineering contribution that pushes multi-agent simulation to unprecedented temporal scales. The memory system and Life Reward framework are particularly well-designed. The real test will be whether these simulations produce genuinely novel social science insights rather than just reflecting the biases baked into their LLM backends.