asqa
Stay organized with collections
Save and categorize content based on your preferences.
ASQA is the first long-form question answering dataset that focuses on ambiguous
factoid questions. Different from previous long-form answers datasets, each
question is annotated with both long-form answers and extractive question-answer
pairs, which should be answerable by the generated passage. A generated
long-form answer will be evaluated using both ROUGE and QA accuracy. We showed
that these evaluation metrics correlated with human judgment well. In this
repostory we release the ASQA dataset, together with the evaluation code:
<a href="https://github.com/google-research/language/tree/master/language/asqa">https://github.com/google-research/language/tree/master/language/asqa</a>
Split |
Examples |
'dev' |
948 |
'train' |
4,353 |
FeaturesDict({
'ambiguous_question': Text(shape=(), dtype=string),
'annotations': Sequence({
'knowledge': Sequence({
'content': Text(shape=(), dtype=string),
'wikipage': Text(shape=(), dtype=string),
}),
'long_answer': Text(shape=(), dtype=string),
}),
'qa_pairs': Sequence({
'context': Text(shape=(), dtype=string),
'question': Text(shape=(), dtype=string),
'short_answers': Sequence(Text(shape=(), dtype=string)),
'wikipage': Text(shape=(), dtype=string),
}),
'sample_id': int64,
'wikipages': Sequence({
'title': Text(shape=(), dtype=string),
'url': Text(shape=(), dtype=string),
}),
})
Feature |
Class |
Shape |
Dtype |
Description |
|
FeaturesDict |
|
|
|
ambiguous_question |
Text |
|
string |
Disambiguated question from AmbigQA. |
annotations |
Sequence |
|
|
Long-form answers to the ambiguous question constructed by ASQA annotators. |
annotations/knowledge |
Sequence |
|
|
List of additional knowledge pieces. |
annotations/knowledge/content |
Text |
|
string |
A passage from Wikipedia. |
annotations/knowledge/wikipage |
Text |
|
string |
Title of the Wikipedia page the passage was taken from. |
annotations/long_answer |
Text |
|
string |
Annotation. |
qa_pairs |
Sequence |
|
|
Q&A pairs from AmbigQA which are used for disambiguation. |
qa_pairs/context |
Text |
|
string |
Additional context provided. |
qa_pairs/question |
Text |
|
string |
|
qa_pairs/short_answers |
Sequence(Text) |
(None,) |
string |
List of short answers from AmbigQA. |
qa_pairs/wikipage |
Text |
|
string |
Title of the Wikipedia page the additional context was taken from. |
sample_id |
Tensor |
|
int64 |
|
wikipages |
Sequence |
|
|
List of Wikipedia pages visited by AmbigQA annotators. |
wikipages/title |
Text |
|
string |
Title of the Wikipedia page. |
wikipages/url |
Text |
|
string |
Link to the Wikipedia page. |
@misc{https://doi.org/10.48550/arxiv.2204.06092,
doi = {10.48550/ARXIV.2204.06092},
url = {https://arxiv.org/abs/2204.06092},
author = {Stelmakh, Ivan and Luan, Yi and Dhingra, Bhuwan and Chang, Ming-Wei},
keywords = {Computation and Language (cs.CL), FOS: Computer and information sciences, FOS: Computer and information sciences},
title = {ASQA: Factoid Questions Meet Long-Form Answers},
publisher = {arXiv},
year = {2022},
copyright = {arXiv.org perpetual, non-exclusive license}
}
Except as otherwise noted, the content of this page is licensed under the Creative Commons Attribution 4.0 License, and code samples are licensed under the Apache 2.0 License. For details, see the Google Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.
Last updated 2025-03-14 UTC.
[[["Easy to understand","easyToUnderstand","thumb-up"],["Solved my problem","solvedMyProblem","thumb-up"],["Other","otherUp","thumb-up"]],[["Missing the information I need","missingTheInformationINeed","thumb-down"],["Too complicated / too many steps","tooComplicatedTooManySteps","thumb-down"],["Out of date","outOfDate","thumb-down"],["Samples / code issue","samplesCodeIssue","thumb-down"],["Other","otherDown","thumb-down"]],["Last updated 2025-03-14 UTC."],[],[],null,["# asqa\n\n\u003cbr /\u003e\n\n- **Description**:\n\nASQA is the first long-form question answering dataset that focuses on ambiguous\nfactoid questions. Different from previous long-form answers datasets, each\nquestion is annotated with both long-form answers and extractive question-answer\npairs, which should be answerable by the generated passage. A generated\nlong-form answer will be evaluated using both ROUGE and QA accuracy. We showed\nthat these evaluation metrics correlated with human judgment well. In this\nrepostory we release the ASQA dataset, together with the evaluation code:\n`\u003ca href=\"https://github.com/google-research/language/tree/master/language/asqa\"\u003ehttps://github.com/google-research/language/tree/master/language/asqa\u003c/a\u003e`\n\n- **Homepage** :\n \u003chttps://github.com/google-research/language/tree/master/language/asqa\u003e\n\n- **Source code** :\n [`tfds.datasets.asqa.Builder`](https://github.com/tensorflow/datasets/tree/master/tensorflow_datasets/datasets/asqa/asqa_dataset_builder.py)\n\n- **Versions**:\n\n - `1.0.0`: Initial release.\n - **`2.0.0`** (default): Sample ID goes from int32 (overflowing) to int64.\n- **Download size** : `17.86 MiB`\n\n- **Dataset size** : `14.51 MiB`\n\n- **Auto-cached**\n ([documentation](https://www.tensorflow.org/datasets/performances#auto-caching)):\n Yes\n\n- **Splits**:\n\n| Split | Examples |\n|-----------|----------|\n| `'dev'` | 948 |\n| `'train'` | 4,353 |\n\n- **Feature structure**:\n\n FeaturesDict({\n 'ambiguous_question': Text(shape=(), dtype=string),\n 'annotations': Sequence({\n 'knowledge': Sequence({\n 'content': Text(shape=(), dtype=string),\n 'wikipage': Text(shape=(), dtype=string),\n }),\n 'long_answer': Text(shape=(), dtype=string),\n }),\n 'qa_pairs': Sequence({\n 'context': Text(shape=(), dtype=string),\n 'question': Text(shape=(), dtype=string),\n 'short_answers': Sequence(Text(shape=(), dtype=string)),\n 'wikipage': Text(shape=(), dtype=string),\n }),\n 'sample_id': int64,\n 'wikipages': Sequence({\n 'title': Text(shape=(), dtype=string),\n 'url': Text(shape=(), dtype=string),\n }),\n })\n\n- **Feature documentation**:\n\n| Feature | Class | Shape | Dtype | Description |\n|--------------------------------|----------------|---------|--------|-----------------------------------------------------------------------------|\n| | FeaturesDict | | | |\n| ambiguous_question | Text | | string | Disambiguated question from AmbigQA. |\n| annotations | Sequence | | | Long-form answers to the ambiguous question constructed by ASQA annotators. |\n| annotations/knowledge | Sequence | | | List of additional knowledge pieces. |\n| annotations/knowledge/content | Text | | string | A passage from Wikipedia. |\n| annotations/knowledge/wikipage | Text | | string | Title of the Wikipedia page the passage was taken from. |\n| annotations/long_answer | Text | | string | Annotation. |\n| qa_pairs | Sequence | | | Q\\&A pairs from AmbigQA which are used for disambiguation. |\n| qa_pairs/context | Text | | string | Additional context provided. |\n| qa_pairs/question | Text | | string | |\n| qa_pairs/short_answers | Sequence(Text) | (None,) | string | List of short answers from AmbigQA. |\n| qa_pairs/wikipage | Text | | string | Title of the Wikipedia page the additional context was taken from. |\n| sample_id | Tensor | | int64 | |\n| wikipages | Sequence | | | List of Wikipedia pages visited by AmbigQA annotators. |\n| wikipages/title | Text | | string | Title of the Wikipedia page. |\n| wikipages/url | Text | | string | Link to the Wikipedia page. |\n\n- **Supervised keys** (See\n [`as_supervised` doc](https://www.tensorflow.org/datasets/api_docs/python/tfds/load#args)):\n `None`\n\n- **Figure**\n ([tfds.show_examples](https://www.tensorflow.org/datasets/api_docs/python/tfds/visualization/show_examples)):\n Not supported.\n\n- **Examples**\n ([tfds.as_dataframe](https://www.tensorflow.org/datasets/api_docs/python/tfds/as_dataframe)):\n\nDisplay examples... \n\n- **Citation**:\n\n @misc{https://doi.org/10.48550/arxiv.2204.06092,\n doi = {10.48550/ARXIV.2204.06092},\n url = {https://arxiv.org/abs/2204.06092},\n author = {Stelmakh, Ivan and Luan, Yi and Dhingra, Bhuwan and Chang, Ming-Wei},\n keywords = {Computation and Language (cs.CL), FOS: Computer and information sciences, FOS: Computer and information sciences},\n title = {ASQA: Factoid Questions Meet Long-Form Answers},\n publisher = {arXiv},\n year = {2022},\n copyright = {arXiv.org perpetual, non-exclusive license}\n }"]]