e2e_cleaned
Stay organized with collections
Save and categorize content based on your preferences.
An update release of E2E NLG Challenge data with cleaned MRs. The E2E data
contains dialogue act-based meaning representation (MR) in the restaurant domain
and up to 5 references in natural language, which is what one needs to predict.
Split |
Examples |
'test' |
4,693 |
'train' |
33,525 |
'validation' |
4,299 |
FeaturesDict({
'input_text': FeaturesDict({
'table': Sequence({
'column_header': string,
'content': string,
'row_number': int16,
}),
}),
'target_text': string,
})
Feature |
Class |
Shape |
Dtype |
Description |
|
FeaturesDict |
|
|
|
input_text |
FeaturesDict |
|
|
|
input_text/table |
Sequence |
|
|
|
input_text/table/column_header |
Tensor |
|
string |
|
input_text/table/content |
Tensor |
|
string |
|
input_text/table/row_number |
Tensor |
|
int16 |
|
target_text |
Tensor |
|
string |
|
@inproceedings{dusek-etal-2019-semantic,
title = "Semantic Noise Matters for Neural Natural Language Generation",
author = "Du{\v{s} }ek, Ond{\v{r} }ej and
Howcroft, David M. and
Rieser, Verena",
booktitle = "Proceedings of the 12th International Conference on Natural Language Generation",
month = oct # "{--}" # nov,
year = "2019",
address = "Tokyo, Japan",
publisher = "Association for Computational Linguistics",
url = "https://www.aclweb.org/anthology/W19-8652",
doi = "10.18653/v1/W19-8652",
pages = "421--426",
abstract = "Neural natural language generation (NNLG) systems are known for their pathological outputs, i.e. generating text which is unrelated to the input specification. In this paper, we show the impact of semantic noise on state-of-the-art NNLG models which implement different semantic control mechanisms. We find that cleaned data can improve semantic correctness by up to 97{\%}, while maintaining fluency. We also find that the most common error is omitting information, rather than hallucination.",
}
Except as otherwise noted, the content of this page is licensed under the Creative Commons Attribution 4.0 License, and code samples are licensed under the Apache 2.0 License. For details, see the Google Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.
Last updated 2022-12-06 UTC.
[[["Easy to understand","easyToUnderstand","thumb-up"],["Solved my problem","solvedMyProblem","thumb-up"],["Other","otherUp","thumb-up"]],[["Missing the information I need","missingTheInformationINeed","thumb-down"],["Too complicated / too many steps","tooComplicatedTooManySteps","thumb-down"],["Out of date","outOfDate","thumb-down"],["Samples / code issue","samplesCodeIssue","thumb-down"],["Other","otherDown","thumb-down"]],["Last updated 2022-12-06 UTC."],[],[],null,["# e2e_cleaned\n\n\u003cbr /\u003e\n\n- **Description**:\n\nAn update release of E2E NLG Challenge data with cleaned MRs. The E2E data\ncontains dialogue act-based meaning representation (MR) in the restaurant domain\nand up to 5 references in natural language, which is what one needs to predict.\n\n- **Additional Documentation** :\n [Explore on Papers With Code\n north_east](https://paperswithcode.com/dataset/e2e)\n\n- **Homepage** :\n \u003chttps://github.com/tuetschek/e2e-cleaning\u003e\n\n- **Source code** :\n [`tfds.datasets.e2e_cleaned.Builder`](https://github.com/tensorflow/datasets/tree/master/tensorflow_datasets/datasets/e2e_cleaned/e2e_cleaned_dataset_builder.py)\n\n- **Versions**:\n\n - **`0.1.0`** (default): No release notes.\n- **Download size** : `13.92 MiB`\n\n- **Dataset size** : `14.70 MiB`\n\n- **Auto-cached**\n ([documentation](https://www.tensorflow.org/datasets/performances#auto-caching)):\n Yes\n\n- **Splits**:\n\n| Split | Examples |\n|----------------|----------|\n| `'test'` | 4,693 |\n| `'train'` | 33,525 |\n| `'validation'` | 4,299 |\n\n- **Feature structure**:\n\n FeaturesDict({\n 'input_text': FeaturesDict({\n 'table': Sequence({\n 'column_header': string,\n 'content': string,\n 'row_number': int16,\n }),\n }),\n 'target_text': string,\n })\n\n- **Feature documentation**:\n\n| Feature | Class | Shape | Dtype | Description |\n|--------------------------------|--------------|-------|--------|-------------|\n| | FeaturesDict | | | |\n| input_text | FeaturesDict | | | |\n| input_text/table | Sequence | | | |\n| input_text/table/column_header | Tensor | | string | |\n| input_text/table/content | Tensor | | string | |\n| input_text/table/row_number | Tensor | | int16 | |\n| target_text | Tensor | | string | |\n\n- **Supervised keys** (See\n [`as_supervised` doc](https://www.tensorflow.org/datasets/api_docs/python/tfds/load#args)):\n `('input_text', 'target_text')`\n\n- **Figure**\n ([tfds.show_examples](https://www.tensorflow.org/datasets/api_docs/python/tfds/visualization/show_examples)):\n Not supported.\n\n- **Examples**\n ([tfds.as_dataframe](https://www.tensorflow.org/datasets/api_docs/python/tfds/as_dataframe)):\n\nDisplay examples... \n\n- **Citation**:\n\n @inproceedings{dusek-etal-2019-semantic,\n title = \"Semantic Noise Matters for Neural Natural Language Generation\",\n author = \"Du{\\v{s} }ek, Ond{\\v{r} }ej and\n Howcroft, David M. and\n Rieser, Verena\",\n booktitle = \"Proceedings of the 12th International Conference on Natural Language Generation\",\n month = oct # \"{--}\" # nov,\n year = \"2019\",\n address = \"Tokyo, Japan\",\n publisher = \"Association for Computational Linguistics\",\n url = \"https://www.aclweb.org/anthology/W19-8652\",\n doi = \"10.18653/v1/W19-8652\",\n pages = \"421--426\",\n abstract = \"Neural natural language generation (NNLG) systems are known for their pathological outputs, i.e. generating text which is unrelated to the input specification. In this paper, we show the impact of semantic noise on state-of-the-art NNLG models which implement different semantic control mechanisms. We find that cleaned data can improve semantic correctness by up to 97{\\%}, while maintaining fluency. We also find that the most common error is omitting information, rather than hallucination.\",\n }"]]