multi_news

  • Description:

Multi-News Dataset

Multi-News consists of news articles and human-written summaries of these articles from the news site newser.com. Each summary is professionally written by editors and includes links to the original articles cited.

This is the first large-scale dataset for multi-document summarization on news articles.

Each record has two features:

Split Examples
'test' 5,622
'train' 44,972
'validation' 5,622
  • Feature structure:
FeaturesDict({
    'document': Text(shape=(), dtype=string),
    'summary': Text(shape=(), dtype=string),
})
  • Feature documentation:
Feature Class Shape Dtype Description
FeaturesDict
document Text string
summary Text string
  • Citation:
@misc{alex2019multinews,
    title={Multi-News: a Large-Scale Multi-Document Summarization Dataset and Abstractive Hierarchical Model},
    author={Alexander R. Fabbri and Irene Li and Tianwei She and Suyi Li and Dragomir R. Radev},
    year={2019},
    eprint={1906.01749},
    archivePrefix={arXiv},
    primaryClass={cs.CL}
}