web_nlg
Stay organized with collections
Save and categorize content based on your preferences.
The data contains sets of 1 to 7 triples of the form subject-predicate-object
extracted from (DBpedia)[https://wiki.dbpedia.org/] and natural language text
that's a verbalisation of these triples. The test data spans 15 different
domains where only 10 appear in the training data. The dataset follows a
standarized table format.
Split |
Examples |
'test_all' |
4,928 |
'test_unseen' |
2,433 |
'train' |
18,102 |
'validation' |
2,268 |
FeaturesDict({
'input_text': FeaturesDict({
'context': string,
'table': Sequence({
'column_header': string,
'content': string,
'row_number': int16,
}),
}),
'target_text': string,
})
Feature |
Class |
Shape |
Dtype |
Description |
|
FeaturesDict |
|
|
|
input_text |
FeaturesDict |
|
|
|
input_text/context |
Tensor |
|
string |
|
input_text/table |
Sequence |
|
|
|
input_text/table/column_header |
Tensor |
|
string |
|
input_text/table/content |
Tensor |
|
string |
|
input_text/table/row_number |
Tensor |
|
int16 |
|
target_text |
Tensor |
|
string |
|
@inproceedings{gardent2017creating,
title = ""Creating Training Corpora for {NLG} Micro-Planners"",
author = ""Gardent, Claire and
Shimorina, Anastasia and
Narayan, Shashi and
Perez-Beltrachini, Laura"",
booktitle = ""Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)"",
month = jul,
year = ""2017"",
address = ""Vancouver, Canada"",
publisher = ""Association for Computational Linguistics"",
doi = ""10.18653/v1/P17-1017"",
pages = ""179--188"",
url = ""https://www.aclweb.org/anthology/P17-1017.pdf""
}
Except as otherwise noted, the content of this page is licensed under the Creative Commons Attribution 4.0 License, and code samples are licensed under the Apache 2.0 License. For details, see the Google Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.
Last updated 2022-12-06 UTC.
[[["Easy to understand","easyToUnderstand","thumb-up"],["Solved my problem","solvedMyProblem","thumb-up"],["Other","otherUp","thumb-up"]],[["Missing the information I need","missingTheInformationINeed","thumb-down"],["Too complicated / too many steps","tooComplicatedTooManySteps","thumb-down"],["Out of date","outOfDate","thumb-down"],["Samples / code issue","samplesCodeIssue","thumb-down"],["Other","otherDown","thumb-down"]],["Last updated 2022-12-06 UTC."],[],[],null,["# web_nlg\n\n\u003cbr /\u003e\n\n- **Description**:\n\nThe data contains sets of 1 to 7 triples of the form subject-predicate-object\nextracted from (DBpedia)\\[\u003chttps://wiki.dbpedia.org/\u003e\\] and natural language text\nthat's a verbalisation of these triples. The test data spans 15 different\ndomains where only 10 appear in the training data. The dataset follows a\nstandarized table format.\n\n- **Additional Documentation** :\n [Explore on Papers With Code\n north_east](https://paperswithcode.com/dataset/webnlg)\n\n- **Homepage** :\n \u003chttps://webnlg-challenge.loria.fr/challenge_2017/\u003e\n\n- **Source code** :\n [`tfds.structured.web_nlg.WebNlg`](https://github.com/tensorflow/datasets/tree/master/tensorflow_datasets/structured/web_nlg/web_nlg.py)\n\n- **Versions**:\n\n - **`0.1.0`** (default): No release notes.\n- **Download size** : `19.76 MiB`\n\n- **Dataset size** : `13.78 MiB`\n\n- **Auto-cached**\n ([documentation](https://www.tensorflow.org/datasets/performances#auto-caching)):\n Yes\n\n- **Splits**:\n\n| Split | Examples |\n|-----------------|----------|\n| `'test_all'` | 4,928 |\n| `'test_unseen'` | 2,433 |\n| `'train'` | 18,102 |\n| `'validation'` | 2,268 |\n\n- **Feature structure**:\n\n FeaturesDict({\n 'input_text': FeaturesDict({\n 'context': string,\n 'table': Sequence({\n 'column_header': string,\n 'content': string,\n 'row_number': int16,\n }),\n }),\n 'target_text': string,\n })\n\n- **Feature documentation**:\n\n| Feature | Class | Shape | Dtype | Description |\n|--------------------------------|--------------|-------|--------|-------------|\n| | FeaturesDict | | | |\n| input_text | FeaturesDict | | | |\n| input_text/context | Tensor | | string | |\n| input_text/table | Sequence | | | |\n| input_text/table/column_header | Tensor | | string | |\n| input_text/table/content | Tensor | | string | |\n| input_text/table/row_number | Tensor | | int16 | |\n| target_text | Tensor | | string | |\n\n- **Supervised keys** (See\n [`as_supervised` doc](https://www.tensorflow.org/datasets/api_docs/python/tfds/load#args)):\n `('input_text', 'target_text')`\n\n- **Figure**\n ([tfds.show_examples](https://www.tensorflow.org/datasets/api_docs/python/tfds/visualization/show_examples)):\n Not supported.\n\n- **Examples**\n ([tfds.as_dataframe](https://www.tensorflow.org/datasets/api_docs/python/tfds/as_dataframe)):\n\nDisplay examples... \n\n- **Citation**:\n\n @inproceedings{gardent2017creating,\n title = \"\"Creating Training Corpora for {NLG} Micro-Planners\"\",\n author = \"\"Gardent, Claire and\n Shimorina, Anastasia and\n Narayan, Shashi and\n Perez-Beltrachini, Laura\"\",\n booktitle = \"\"Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)\"\",\n month = jul,\n year = \"\"2017\"\",\n address = \"\"Vancouver, Canada\"\",\n publisher = \"\"Association for Computational Linguistics\"\",\n doi = \"\"10.18653/v1/P17-1017\"\",\n pages = \"\"179--188\"\",\n url = \"\"https://www.aclweb.org/anthology/P17-1017.pdf\"\"\n }"]]