- Description:
Multi-News Dataset
Multi-News consists of news articles and human-written summaries of these
articles from the news site newser.com
. Each summary is professionally written
by editors and includes links to the original articles cited.
This is the first large-scale dataset for multi-document summarization on news articles.
Each record has two features:
document
: Texts of news articles, separated by special token "|||||".summary
: Summary of the news.Additional Documentation: Explore on Papers With Code
Source code:
tfds.datasets.multi_news.Builder
Versions:
1.0.0
: Initial release.2.0.0
: [Do not use] Update the dataset with valid URLs.2.1.0
(default) : Update the dataset with the correct URLs. The URLs in this version come from HuggingFace's dataset repo, which is curated by the same author: https://huggingface.co/datasets/alexfabbri/multi_news
Download size:
721.73 MiB
Dataset size:
666.50 MiB
Auto-cached (documentation): No
Splits:
Split | Examples |
---|---|
'test' |
5,622 |
'train' |
44,972 |
'validation' |
5,622 |
- Feature structure:
FeaturesDict({
'document': Text(shape=(), dtype=string),
'summary': Text(shape=(), dtype=string),
})
- Feature documentation:
Feature | Class | Shape | Dtype | Description |
---|---|---|---|---|
FeaturesDict | ||||
document | Text | string | ||
summary | Text | string |
Supervised keys (See
as_supervised
doc):('document', 'summary')
Figure (tfds.show_examples): Not supported.
Examples (tfds.as_dataframe):
- Citation:
@misc{alex2019multinews,
title={Multi-News: a Large-Scale Multi-Document Summarization Dataset and Abstractive Hierarchical Model},
author={Alexander R. Fabbri and Irene Li and Tianwei She and Suyi Li and Dragomir R. Radev},
year={2019},
eprint={1906.01749},
archivePrefix={arXiv},
primaryClass={cs.CL}
}