Collected molecules will appear here. Add from search or explore.
Research implementation of a locality-aware abstractive summarization model that utilizes document structure (specifically page-level information) to handle long documents.
Defensibility
stars
18
forks
2
PageSum represents a specific snapshot of NLP research from late 2021/early 2022 (EMNLP 2022) focused on solving the context window limitations of models like BART and LED. With only 18 stars and 2 forks over 3.5 years, it has failed to gain any significant industry traction or community developer mindshare. In the current landscape, the 'locality' problem it attempts to solve via architectural tweaks has been largely neutralized by frontier models (GPT-4o, Claude 3.5 Sonnet, Gemini 1.5 Pro) that offer context windows ranging from 128k to 2M tokens and native long-context reasoning. The project is effectively a legacy research artifact; while the insights regarding document structure remain valid, the implementation is no longer competitive against modern RAG architectures or long-context LLMs. Platform risk is maximum as summarization is now a commodity API feature provided by every major cloud and AI provider.
TECH STACK
INTEGRATION
reference_implementation
READINESS