summscreen
Stay organized with collections
Save and categorize content based on your preferences.
SummScreen Summarization dataset, non-anonymized, non-tokenized version.
Train/val/test splits and filtering are based on the final tokenized dataset,
but transcripts and recaps provided are based on the untokenized text.
There are two features:
@article{DBLP:journals/corr/abs-2104-07091,
author = {Mingda Chen and
Zewei Chu and
Sam Wiseman and
Kevin Gimpel},
title = {SummScreen: {A} Dataset for Abstractive Screenplay Summarization},
journal = {CoRR},
volume = {abs/2104.07091},
year = {2021},
url = {https://arxiv.org/abs/2104.07091},
archivePrefix = {arXiv},
eprint = {2104.07091},
timestamp = {Mon, 19 Apr 2021 16:45:47 +0200},
biburl = {https://dblp.org/rec/journals/corr/abs-2104-07091.bib},
bibsource = {dblp computer science bibliography, https://dblp.org}
}
summscreen/fd (default config)
Split |
Examples |
'test' |
337 |
'train' |
3,673 |
'validation' |
338 |
FeaturesDict({
'episode_number': Text(shape=(), dtype=string),
'episode_title': Text(shape=(), dtype=string),
'recap': Text(shape=(), dtype=string),
'show_title': Text(shape=(), dtype=string),
'transcript': Text(shape=(), dtype=string),
'transcript_author': Text(shape=(), dtype=string),
})
Feature |
Class |
Shape |
Dtype |
Description |
|
FeaturesDict |
|
|
|
episode_number |
Text |
|
string |
|
episode_title |
Text |
|
string |
|
recap |
Text |
|
string |
|
show_title |
Text |
|
string |
|
transcript |
Text |
|
string |
|
transcript_author |
Text |
|
string |
|
summscreen/tms
Split |
Examples |
'test' |
1,793 |
'train' |
18,915 |
'validation' |
1,795 |
FeaturesDict({
'episode_summary': Text(shape=(), dtype=string),
'recap': Text(shape=(), dtype=string),
'recap_author': Text(shape=(), dtype=string),
'show_title': Text(shape=(), dtype=string),
'transcript': Text(shape=(), dtype=string),
'transcript_author': Tensor(shape=(None,), dtype=string),
})
Feature |
Class |
Shape |
Dtype |
Description |
|
FeaturesDict |
|
|
|
episode_summary |
Text |
|
string |
|
recap |
Text |
|
string |
|
recap_author |
Text |
|
string |
|
show_title |
Text |
|
string |
|
transcript |
Text |
|
string |
|
transcript_author |
Tensor |
(None,) |
string |
|
Except as otherwise noted, the content of this page is licensed under the Creative Commons Attribution 4.0 License, and code samples are licensed under the Apache 2.0 License. For details, see the Google Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.
Last updated 2023-01-13 UTC.
[[["Easy to understand","easyToUnderstand","thumb-up"],["Solved my problem","solvedMyProblem","thumb-up"],["Other","otherUp","thumb-up"]],[["Missing the information I need","missingTheInformationINeed","thumb-down"],["Too complicated / too many steps","tooComplicatedTooManySteps","thumb-down"],["Out of date","outOfDate","thumb-down"],["Samples / code issue","samplesCodeIssue","thumb-down"],["Other","otherDown","thumb-down"]],["Last updated 2023-01-13 UTC."],[],[],null,["# summscreen\n\n\u003cbr /\u003e\n\n- **Description**:\n\nSummScreen Summarization dataset, non-anonymized, non-tokenized version.\n\nTrain/val/test splits and filtering are based on the final tokenized dataset,\nbut transcripts and recaps provided are based on the untokenized text.\n\nThere are two features:\n\n- transcript: Full episode transcripts, each line of dialogue separated by newlines\n- recap: Recaps or summaries of episodes\n\n- **Homepage** :\n \u003chttps://github.com/mingdachen/SummScreen\u003e\n\n- **Source code** :\n [`tfds.datasets.summscreen.Builder`](https://github.com/tensorflow/datasets/tree/master/tensorflow_datasets/datasets/summscreen/summscreen_dataset_builder.py)\n\n- **Versions**:\n\n - **`1.0.0`** (default): Initial release.\n- **Download size** : `841.27 MiB`\n\n- **Supervised keys** (See\n [`as_supervised` doc](https://www.tensorflow.org/datasets/api_docs/python/tfds/load#args)):\n `('transcript', 'recap')`\n\n- **Figure**\n ([tfds.show_examples](https://www.tensorflow.org/datasets/api_docs/python/tfds/visualization/show_examples)):\n Not supported.\n\n- **Citation**:\n\n @article{DBLP:journals/corr/abs-2104-07091,\n author = {Mingda Chen and\n Zewei Chu and\n Sam Wiseman and\n Kevin Gimpel},\n title = {SummScreen: {A} Dataset for Abstractive Screenplay Summarization},\n journal = {CoRR},\n volume = {abs/2104.07091},\n year = {2021},\n url = {https://arxiv.org/abs/2104.07091},\n archivePrefix = {arXiv},\n eprint = {2104.07091},\n timestamp = {Mon, 19 Apr 2021 16:45:47 +0200},\n biburl = {https://dblp.org/rec/journals/corr/abs-2104-07091.bib},\n bibsource = {dblp computer science bibliography, https://dblp.org}\n }\n\nsummscreen/fd (default config)\n------------------------------\n\n- **Config description**: ForeverDreaming\n\n- **Dataset size** : `132.99 MiB`\n\n- **Auto-cached**\n ([documentation](https://www.tensorflow.org/datasets/performances#auto-caching)):\n Yes\n\n- **Splits**:\n\n| Split | Examples |\n|----------------|----------|\n| `'test'` | 337 |\n| `'train'` | 3,673 |\n| `'validation'` | 338 |\n\n- **Feature structure**:\n\n FeaturesDict({\n 'episode_number': Text(shape=(), dtype=string),\n 'episode_title': Text(shape=(), dtype=string),\n 'recap': Text(shape=(), dtype=string),\n 'show_title': Text(shape=(), dtype=string),\n 'transcript': Text(shape=(), dtype=string),\n 'transcript_author': Text(shape=(), dtype=string),\n })\n\n- **Feature documentation**:\n\n| Feature | Class | Shape | Dtype | Description |\n|-------------------|--------------|-------|--------|-------------|\n| | FeaturesDict | | | |\n| episode_number | Text | | string | |\n| episode_title | Text | | string | |\n| recap | Text | | string | |\n| show_title | Text | | string | |\n| transcript | Text | | string | |\n| transcript_author | Text | | string | |\n\n- **Examples** ([tfds.as_dataframe](https://www.tensorflow.org/datasets/api_docs/python/tfds/as_dataframe)):\n\nDisplay examples... \n\nsummscreen/tms\n--------------\n\n- **Config description**: TVMegaSite\n\n- **Dataset size** : `592.53 MiB`\n\n- **Auto-cached**\n ([documentation](https://www.tensorflow.org/datasets/performances#auto-caching)):\n No\n\n- **Splits**:\n\n| Split | Examples |\n|----------------|----------|\n| `'test'` | 1,793 |\n| `'train'` | 18,915 |\n| `'validation'` | 1,795 |\n\n- **Feature structure**:\n\n FeaturesDict({\n 'episode_summary': Text(shape=(), dtype=string),\n 'recap': Text(shape=(), dtype=string),\n 'recap_author': Text(shape=(), dtype=string),\n 'show_title': Text(shape=(), dtype=string),\n 'transcript': Text(shape=(), dtype=string),\n 'transcript_author': Tensor(shape=(None,), dtype=string),\n })\n\n- **Feature documentation**:\n\n| Feature | Class | Shape | Dtype | Description |\n|-------------------|--------------|---------|--------|-------------|\n| | FeaturesDict | | | |\n| episode_summary | Text | | string | |\n| recap | Text | | string | |\n| recap_author | Text | | string | |\n| show_title | Text | | string | |\n| transcript | Text | | string | |\n| transcript_author | Tensor | (None,) | string | |\n\n- **Examples** ([tfds.as_dataframe](https://www.tensorflow.org/datasets/api_docs/python/tfds/as_dataframe)):\n\nDisplay examples..."]]