- Description:
CORD-19 is a resource of over 45,000 scholarly articles, including over 33,000 with full text, about COVID-19, SARS-CoV-2, and related coronaviruses.
To help organizing information in scientific literatures of COVID-19 through abstractive summarization. This dataset parse those articles to pairs of document and summaries of full_text-abstract or introduction-abstract.
Features includes strings of: abstract, full_text, sha (hash of pdf), source_x (source of publication), title, doi (digital object identifier), license, authors, publish_time, journal, url.
Additional Documentation: Explore on Papers With Code
Homepage: https://www.kaggle.com/allen-institute-for-ai/CORD-19-research-challenge
Source code:
tfds.summarization.Covid19sumVersions:
1.0.0(default): No release notes.
Download size:
Unknown sizeDataset size:
Unknown sizeManual download instructions: This dataset requires you to download the source data manually into
download_config.manual_dir(defaults to~/tensorflow_datasets/downloads/manual/):
This dataset need to be manually downloaded through kaggle api:kaggle datasets download allen-institute-for-ai/CORD-19-research-challengePlace the downloaded zip file in the manual folder.Auto-cached (documentation): Unknown
Splits:
| Split | Examples |
|---|
- Feature structure:
FeaturesDict({
'abstract': string,
'authors': string,
'body_text': Sequence({
'section': string,
'text': string,
}),
'doi': string,
'journal': string,
'license': string,
'publish_time': string,
'sha': string,
'source_x': string,
'title': string,
'url': string,
})
- Feature documentation:
| Feature | Class | Shape | Dtype | Description |
|---|---|---|---|---|
| FeaturesDict | ||||
| abstract | Tensor | string | ||
| authors | Tensor | string | ||
| body_text | Sequence | |||
| body_text/section | Tensor | string | ||
| body_text/text | Tensor | string | ||
| doi | Tensor | string | ||
| journal | Tensor | string | ||
| license | Tensor | string | ||
| publish_time | Tensor | string | ||
| sha | Tensor | string | ||
| source_x | Tensor | string | ||
| title | Tensor | string | ||
| url | Tensor | string |
Supervised keys (See
as_superviseddoc):('body_text', 'abstract')Figure (tfds.show_examples): Not supported.
Examples (tfds.as_dataframe): Missing.
Citation:
@ONLINE {CORD-19-research-challenge,
author = "An AI challenge with AI2, CZI, MSR, Georgetown, NIH & The White House",
title = "COVID-19 Open Research Dataset Challenge (CORD-19)",
month = "april",
year = "2020",
url = "https://www.kaggle.com/allen-institute-for-ai/CORD-19-research-challenge"
}