StoryScope: Investigating Idiosyncrasies in AI Fiction¶

A fascinating new paper on arXiv (2604.03136) introduces STORYSCOPE, a pipeline that extracts discourse-level narrative features — plot structure, character agency, temporal structure — from fiction to distinguish AI-written from human-written stories.

Unlike surface-level signals such as word choice or the overused word "delve," these structural features are robust to editing and paraphrasing, making them far harder to circumvent.

The Dataset¶

10,272 human-written stories sourced from Books3
5 LLMs generated mirrored stories: Claude, DeepSeek, Gemini, GPT, and Kimi
61,608 total stories in the corpus

Three-Stage Pipeline¶

Structured Narrative Representations — Stories are analyzed across 10 dimensions defined by the NarraBench framework.
Cross-Source LLM Comparison — The narrative profiles of each model are compared against each other and against humans.
Feature Discovery — 304 interpretable features are distilled into a compact fingerprint of writing style.

Results¶

93.2% macro-F1 for Human vs. AI detection using only narrative features (this captures 97% of the performance achieved when including style features too).
68.4% macro-F1 for 6-way authorship attribution (identifying which model wrote a given piece).
Robust to LAMP editing — still achieves 93.9% F1 after text is edited, meaning surface-level paraphrasing does not evade detection.

Key AI vs. Human Differences¶

Dimension	AI	Human
Theme explanation	77% of stories over-explain themes	52%
Olfactory/sensory imagery	81% over-describe body/senses	38%
Plot structure	Favors tidy single-track plots	More nonlinearity
Reader address	Rare	Common
Intertextual references	Rare	Common

Perhaps most interestingly, the paper discovered distinct model fingerprints — meaning each LLM leaves a unique narrative signature that can be identified even when the topic and genre are the same.

Source: arXiv 2604.03136 — StoryScope