gap
Stay organized with collections
Save and categorize content based on your preferences.
GAP is a gender-balanced dataset containing 8,908 coreference-labeled pairs of
(ambiguous pronoun, antecedent name), sampled from Wikipedia and released by
Google AI Language for the evaluation of coreference resolution in practical
applications.
Split |
Examples |
'test' |
2,000 |
'train' |
2,000 |
'validation' |
454 |
FeaturesDict({
'A': Text(shape=(), dtype=string),
'A-coref': bool,
'A-offset': int32,
'B': Text(shape=(), dtype=string),
'B-coref': bool,
'B-offset': int32,
'ID': Text(shape=(), dtype=string),
'Pronoun': Text(shape=(), dtype=string),
'Pronoun-offset': int32,
'Text': Text(shape=(), dtype=string),
'URL': Text(shape=(), dtype=string),
})
Feature |
Class |
Shape |
Dtype |
Description |
|
FeaturesDict |
|
|
|
A |
Text |
|
string |
|
A-coref |
Tensor |
|
bool |
|
A-offset |
Tensor |
|
int32 |
|
B |
Text |
|
string |
|
B-coref |
Tensor |
|
bool |
|
B-offset |
Tensor |
|
int32 |
|
ID |
Text |
|
string |
|
Pronoun |
Text |
|
string |
|
Pronoun-offset |
Tensor |
|
int32 |
|
Text |
Text |
|
string |
|
URL |
Text |
|
string |
|
@article{DBLP:journals/corr/abs-1810-05201,
author = {Kellie Webster and
Marta Recasens and
Vera Axelrod and
Jason Baldridge},
title = {Mind the {GAP:} {A} Balanced Corpus of Gendered Ambiguous Pronouns},
journal = {CoRR},
volume = {abs/1810.05201},
year = {2018},
url = {http://arxiv.org/abs/1810.05201},
archivePrefix = {arXiv},
eprint = {1810.05201},
timestamp = {Tue, 30 Oct 2018 20:39:56 +0100},
biburl = {https://dblp.org/rec/bib/journals/corr/abs-1810-05201},
bibsource = {dblp computer science bibliography, https://dblp.org}
}
Except as otherwise noted, the content of this page is licensed under the Creative Commons Attribution 4.0 License, and code samples are licensed under the Apache 2.0 License. For details, see the Google Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.
Last updated 2022-12-22 UTC.
[[["Easy to understand","easyToUnderstand","thumb-up"],["Solved my problem","solvedMyProblem","thumb-up"],["Other","otherUp","thumb-up"]],[["Missing the information I need","missingTheInformationINeed","thumb-down"],["Too complicated / too many steps","tooComplicatedTooManySteps","thumb-down"],["Out of date","outOfDate","thumb-down"],["Samples / code issue","samplesCodeIssue","thumb-down"],["Other","otherDown","thumb-down"]],["Last updated 2022-12-22 UTC."],[],[],null,["# gap\n\n\u003cbr /\u003e\n\n- **Description**:\n\nGAP is a gender-balanced dataset containing 8,908 coreference-labeled pairs of\n(ambiguous pronoun, antecedent name), sampled from Wikipedia and released by\nGoogle AI Language for the evaluation of coreference resolution in practical\napplications.\n\n- **Additional Documentation** :\n [Explore on Papers With Code\n north_east](https://paperswithcode.com/dataset/gap)\n\n- **Homepage** :\n \u003chttps://github.com/google-research-datasets/gap-coreference\u003e\n\n- **Source code** :\n [`tfds.text.Gap`](https://github.com/tensorflow/datasets/tree/master/tensorflow_datasets/text/gap.py)\n\n- **Versions**:\n\n - `0.1.0`: Initial release.\n - **`0.1.1`** (default): Fixes parsing of boolean field `A-coref` and `B-coref`.\n- **Download size** : `2.29 MiB`\n\n- **Dataset size** : `2.96 MiB`\n\n- **Auto-cached**\n ([documentation](https://www.tensorflow.org/datasets/performances#auto-caching)):\n Yes\n\n- **Splits**:\n\n| Split | Examples |\n|----------------|----------|\n| `'test'` | 2,000 |\n| `'train'` | 2,000 |\n| `'validation'` | 454 |\n\n- **Feature structure**:\n\n FeaturesDict({\n 'A': Text(shape=(), dtype=string),\n 'A-coref': bool,\n 'A-offset': int32,\n 'B': Text(shape=(), dtype=string),\n 'B-coref': bool,\n 'B-offset': int32,\n 'ID': Text(shape=(), dtype=string),\n 'Pronoun': Text(shape=(), dtype=string),\n 'Pronoun-offset': int32,\n 'Text': Text(shape=(), dtype=string),\n 'URL': Text(shape=(), dtype=string),\n })\n\n- **Feature documentation**:\n\n| Feature | Class | Shape | Dtype | Description |\n|----------------|--------------|-------|--------|-------------|\n| | FeaturesDict | | | |\n| A | Text | | string | |\n| A-coref | Tensor | | bool | |\n| A-offset | Tensor | | int32 | |\n| B | Text | | string | |\n| B-coref | Tensor | | bool | |\n| B-offset | Tensor | | int32 | |\n| ID | Text | | string | |\n| Pronoun | Text | | string | |\n| Pronoun-offset | Tensor | | int32 | |\n| Text | Text | | string | |\n| URL | Text | | string | |\n\n- **Supervised keys** (See\n [`as_supervised` doc](https://www.tensorflow.org/datasets/api_docs/python/tfds/load#args)):\n `None`\n\n- **Figure**\n ([tfds.show_examples](https://www.tensorflow.org/datasets/api_docs/python/tfds/visualization/show_examples)):\n Not supported.\n\n- **Examples**\n ([tfds.as_dataframe](https://www.tensorflow.org/datasets/api_docs/python/tfds/as_dataframe)):\n\nDisplay examples... \n\n- **Citation**:\n\n @article{DBLP:journals/corr/abs-1810-05201,\n author = {Kellie Webster and\n Marta Recasens and\n Vera Axelrod and\n Jason Baldridge},\n title = {Mind the {GAP:} {A} Balanced Corpus of Gendered Ambiguous Pronouns},\n journal = {CoRR},\n volume = {abs/1810.05201},\n year = {2018},\n url = {http://arxiv.org/abs/1810.05201},\n archivePrefix = {arXiv},\n eprint = {1810.05201},\n timestamp = {Tue, 30 Oct 2018 20:39:56 +0100},\n biburl = {https://dblp.org/rec/bib/journals/corr/abs-1810-05201},\n bibsource = {dblp computer science bibliography, https://dblp.org}\n }"]]