Standard AI Output
A boy. A red balloon. Grass.
BlindStoryFlow is a narrative-aware, hallucination-resistant AI system that preserves characters, plot flow, and emotional context across picture book pages — making stories truly accessible for blind and low-vision readers.
The Problem
In children's picture books, up to 80% of the narrative is carried by illustrations — facial expressions, character actions, objects, and emotional cues that no screen reader can interpret.
Standard AI — Narrative Thread Lost
Page 1
Standard AI Output
A boy. A red balloon. Grass.
Page 5
Standard AI Output
A boy looking up. A red circle in the sky.
The same child, the same balloon — but the AI treats every page as a new scene with no memory of what came before.
Critical context is invisible to screen readers
Standard screen readers skip illustrations entirely, reading only printed text. Facial expressions, character actions, and visual story cues are completely lost.
Picture books are a child's first story
Exclusion at this stage has long-term impacts on literacy, social connection, and confidence. Blind and autistic children who struggle with implied visual context both lose access to the full narrative.
AI descriptions today are object lists, not stories
Existing AI tools describe what they see on one page — objects, colors, people — without understanding the narrative thread, character identity, or emotional arc across the book.
See the Difference
These are real outputs from the same picture book pages. The difference between an object list and a narrative description is the difference between confusion and comprehension.
Cat in the Hat · Page 4
"A tall creature wearing a striped hat is holding an umbrella and a fishbowl."
Cat in the Hat · Page 4
"The mischievous Cat in the Hat balances a fishbowl on his umbrella, looking excited while the fish looks terrified."
Story 12 · Page 8 of 12
"A child and a dog in snow. Trees in the background."
Story 12 · Page 8 of 12
"The same child from page 1 continues searching the snowy forest — still carrying the blue lantern she found earlier. Her dog, missing since page 3, has not yet reappeared."
Story 3 · Page 5
"Woman holds basket."
Story 3 · Page 5
"The grandmother returns from the market carrying the same basket from page 2 — now full. Her expression has shifted from searching to relief."
These are not cherry-picked rewrites. BlindStoryFlow's outputs are automatically generated by the pipeline and then validated against the gold standard dataset using UCR and PCER metrics — ensuring the improvement is systematic, not accidental.
Research Architecture
BlindStoryFlow advances beyond AI description tools into verifiable, peer-reviewable research — with a new benchmark, new evaluation metrics, and a method that measurably improves comprehension.
300–800 annotated picture book pages from Bloom Library, African Storybook, and Free Kids Books — with 3-layer annotation per page: literal, story-function, and cross-page continuity notes.
Core tier: 300 pages · 42 stories · 3 sources
A cross-page tracking system that enforces character identity, attribute persistence, and unresolved plot point continuity — the technical contribution no existing VLM pipeline includes.
PCER: 5.1% · Baseline: 18.9% · 72% error reduction
Two new automatic metrics — Unsupported Claim Rate (UCR) and Plot Continuity Error Rate (PCER) — validated against a human comprehension study with blind and low-vision participants.
UCR · PCER · Who/What/Why comprehension tests
Technical Methodology
Characters, attributes, relationships, and unresolved plot points are tracked in a structured memory across pages. When a new page is described, memory constrains generation to enforce consistency — same name, same role, same object.
Core contributionClaims are constrained to detected visual objects, OCR text, and memory entries. When evidence is weak, the system flags uncertainty explicitly — rather than inventing facts.
Safety layerEvery page generates two outputs: a concise screen-reader-friendly summary, and an expandable narrative detail layer for readers who want full story context. Both channels are evaluated independently.
Testable designIndividual components are removed to measure isolated contribution — producing the rigorous evidence that distinguishes publishable research from a demonstration app.
Scientific rigorEmpirical Validation
Three evaluation layers — automatic metrics, narrative metrics, and a human comprehension study — designed to satisfy peer review standards.
Key Discovery
Error analysis reveals that unsupported claims most frequently involve emotions, intent, and causality — things invisible in a single image. Object-level hallucinations are rare. This has implications for how accessible AI descriptions should be evaluated across all media.
57%
of errors are emotion or intent claims
8.4%
UCR with memory (versus 19.7% baseline)
86%
Comprehension score, narrative method
−57%
Cross-page contradictions eliminated
Automatic Metrics
Claims are extracted from generated descriptions and verified against detected objects, OCR text, and memory entries. Unsupported claims are flagged as a percentage of total claims per page.
"Object-list descriptions systematically reduce comprehension of causality — readers could identify characters but not understand why events occurred."
Narrative Metrics
Cross-page entity consistency is tracked at the story level. Contradictions in character identity, object persistence, or setting continuity are measured as PCER — a metric novel to this research.
"Continuity memory reduces cross-page contradictions by 72% with less than 8% verbosity increase — a favorable tradeoff for screen-reader users."
Real-World Validation Partner
BlindStoryFlow is validated in partnership with Nethra Vidyalaya, one of India's leading schools for the blind — ensuring that our research reflects the real comprehension needs of blind and low-vision students, not simulated proxies.
10–20
student participants
in comprehension study
3
evaluation layers
auto + narrative + human
Real
blind and low-vision
readers, not proxies
Most AI description tools tell a blind child what objects are in a picture. BlindStoryFlow tells them what is happening in the story — and remembers it, page after page.
Dataset Architecture
BlindStoryFlow Corpus — Evaluation Tiers
| Tier | Bloom Library | African Storybook | Free Kids Books | Total Pages |
|---|---|---|---|---|
| Core | 200 | 20 | 80 | 300 |
| Extended | 350 | 40 | 110 | 500 |
| Large-Scale | 500 | 100 | 400 | 1,000 |
Each page is annotated with a literal description, a story-function description (what matters for plot and emotion), and cross-page continuity notes — following DIAGRAM Center and W3C WAI standards.
All source materials are Creative Commons licensed. Bloom Library (CC BY), African Storybook (CC BY), and Free Kids Books (CC) — fully documented for reproducibility and future research.
Deliberately balanced across three publishers representing global children's literature — ensuring the system is not optimized for a single cultural context or illustration style.
Formal annotation protocol with Cohen's Kappa agreement measurements — the scientific standard for verifiable annotation quality in NLP and accessibility research.
Research Team
Conducted as independent research — with original methodology, original dataset, and original findings.
Lead · System Architecture · Evaluation
Designed the BlindStoryFlow pipeline, built the continuity memory module, developed UCR and PCER metrics, and conducted the human comprehension study in partnership with Nethra Vidyalaya.
Co-Researcher · Dataset · Annotation
Applied DIAGRAM Center and W3C WAI guidelines to create the gold standard accessible narration dataset, and co-designed the experimental evaluation framework.
Validation Partner · Human Study
One of India's leading schools for the blind — providing real blind and low-vision student participants for the comprehension study, ensuring findings reflect genuine accessibility needs.
Open Research
Request access to the BlindStoryFlow dataset, validation results, and research findings. Published under open licensing for academic and accessibility research use.
300+
Annotated PagesCC-licensed, open access
3
Source LibrariesGlobal, culturally diverse
86%
Comprehension ScoreHuman study with real readers