assin2
Stay organized with collections
Save and categorize content based on your preferences.
Contextualization
ASSIN 2 is the second edition of the Avaliação de Similaridade Semântica e
Inferência Textual (Evaluating Semantic Similarity and Textual Entailment), and
was a workshop collocated with
STIL 2019.
It follows the
first edition of ASSIN,
proposing a new shared task with new data.
The workshop evaluated systems that assess two types of relations between two
sentences: Semantic Textual Similarity and Textual Entailment.
Semantic Textual Similarity consists of quantifying the level of semantic
equivalence between sentences, while Textual Entailment Recognition consists of
classifying whether the first sentence entails the second.
Data
The corpus used in ASSIN 2 is composed of rather simple sentences. Following the
procedures of SemEval 2014 Task 1, we tried to remove from the corpus named
entities and indirect speech, and tried to have all verbs in the present tense.
The
annotation instructions
given to annotators are available (in Portuguese).
The training and validation data are composed, respectively, of 6,500 and 500
sentence pairs in Brazilian Portuguese, annotated for entailment and semantic
similarity. Semantic similarity values range from 1 to 5, and text entailment
classes are either entailment or none. The test data are composed of
approximately 3,000 sentence pairs with the same annotation. All data were
manually annotated.
Evaluation
Evaluation The evaluation of submissions to ASSIN 2 was with the same metrics as
the first ASSIN, with the F1 of precision and recall as the main metric for text
entailment and Pearson correlation for semantic similarity. The
evaluation scripts are the same as in the
last edition.
PS.: Description is extracted from
official homepage.
Split |
Examples |
'test' |
2,448 |
'train' |
6,500 |
'validation' |
500 |
FeaturesDict({
'entailment': ClassLabel(shape=(), dtype=int64, num_classes=2),
'hypothesis': Text(shape=(), dtype=string),
'id': int32,
'similarity': float32,
'text': Text(shape=(), dtype=string),
})
Feature |
Class |
Shape |
Dtype |
Description |
|
FeaturesDict |
|
|
|
entailment |
ClassLabel |
|
int64 |
|
hypothesis |
Text |
|
string |
|
id |
Tensor |
|
int32 |
|
similarity |
Tensor |
|
float32 |
|
text |
Text |
|
string |
|
@inproceedings{DBLP:conf/propor/RealFO20,
author = {Livy Real and
Erick Fonseca and
Hugo Gon{\c{c} }alo Oliveira},
editor = {Paulo Quaresma and
Renata Vieira and
Sandra M. Alu{\'{\i} }sio and
Helena Moniz and
Fernando Batista and
Teresa Gon{\c{c} }alves},
title = {The {ASSIN} 2 Shared Task: {A} Quick Overview},
booktitle = {Computational Processing of the Portuguese Language - 14th International
Conference, {PROPOR} 2020, Evora, Portugal, March 2-4, 2020, Proceedings},
series = {Lecture Notes in Computer Science},
volume = {12037},
pages = {406--412},
publisher = {Springer},
year = {2020},
url = {https://doi.org/10.1007/978-3-030-41505-1_39},
doi = {10.1007/978-3-030-41505-1_39},
timestamp = {Tue, 03 Mar 2020 09:40:18 +0100},
biburl = {https://dblp.org/rec/conf/propor/RealFO20.bib},
bibsource = {dblp computer science bibliography, https://dblp.org}
}
Except as otherwise noted, the content of this page is licensed under the Creative Commons Attribution 4.0 License, and code samples are licensed under the Apache 2.0 License. For details, see the Google Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.
Last updated 2022-12-06 UTC.
[[["Easy to understand","easyToUnderstand","thumb-up"],["Solved my problem","solvedMyProblem","thumb-up"],["Other","otherUp","thumb-up"]],[["Missing the information I need","missingTheInformationINeed","thumb-down"],["Too complicated / too many steps","tooComplicatedTooManySteps","thumb-down"],["Out of date","outOfDate","thumb-down"],["Samples / code issue","samplesCodeIssue","thumb-down"],["Other","otherDown","thumb-down"]],["Last updated 2022-12-06 UTC."],[],[],null,["# assin2\n\n\u003cbr /\u003e\n\n- **Description**:\n\nContextualization\n-----------------\n\nASSIN 2 is the second edition of the Avaliação de Similaridade Semântica e\nInferência Textual (Evaluating Semantic Similarity and Textual Entailment), and\nwas a workshop collocated with\n[STIL 2019](http://www.google.com/url?q=http%3A%2F%2Fcomissoes.sbc.org.br%2Fce-pln%2Fstil2019%2F&sa=D&sntz=1&usg=AFQjCNHN8DosAsJ-gd48TfkXFX5YD6xM7g).\nIt follows the\n[first edition of ASSIN](http://www.google.com/url?q=http%3A%2F%2Fpropor2016.di.fc.ul.pt%2F%3Fpage_id%3D381&sa=D&sntz=1&usg=AFQjCNHV7ySeNzH4k6MWKBLqO9yUkqiUqw),\nproposing a new shared task with new data.\n\nThe workshop evaluated systems that assess two types of relations between two\nsentences: Semantic Textual Similarity and Textual Entailment.\n\nSemantic Textual Similarity consists of quantifying the level of semantic\nequivalence between sentences, while Textual Entailment Recognition consists of\nclassifying whether the first sentence entails the second.\n\nData\n----\n\nThe corpus used in ASSIN 2 is composed of rather simple sentences. Following the\nprocedures of SemEval 2014 Task 1, we tried to remove from the corpus named\nentities and indirect speech, and tried to have all verbs in the present tense.\nThe\n[annotation instructions](https://drive.google.com/open?id=1aUPhywEHD0r_pxPiTqZwS0fRj-1Xda2w)\ngiven to annotators are available (in Portuguese).\n\nThe training and validation data are composed, respectively, of 6,500 and 500\nsentence pairs in Brazilian Portuguese, annotated for entailment and semantic\nsimilarity. Semantic similarity values range from 1 to 5, and text entailment\nclasses are either entailment or none. The test data are composed of\napproximately 3,000 sentence pairs with the same annotation. All data were\nmanually annotated.\n\nEvaluation\n----------\n\nEvaluation The evaluation of submissions to ASSIN 2 was with the same metrics as\nthe first ASSIN, with the F1 of precision and recall as the main metric for text\nentailment and Pearson correlation for semantic similarity. The\n[evaluation scripts](https://github.com/erickrf/assin) are the same as in the\nlast edition.\n\nPS.: Description is extracted from\n[official homepage](https://sites.google.com/view/assin2/english).\n\n- **Additional Documentation** :\n [Explore on Papers With Code\n north_east](https://paperswithcode.com/dataset/assin2)\n\n- **Homepage** :\n \u003chttps://sites.google.com/view/assin2/english\u003e\n\n- **Source code** :\n [`tfds.datasets.assin2.Builder`](https://github.com/tensorflow/datasets/tree/master/tensorflow_datasets/datasets/assin2/assin2_dataset_builder.py)\n\n- **Versions**:\n\n - **`1.0.0`** (default): Initial release.\n- **Download size** : `2.02 MiB`\n\n- **Dataset size** : `1.82 MiB`\n\n- **Auto-cached**\n ([documentation](https://www.tensorflow.org/datasets/performances#auto-caching)):\n Yes\n\n- **Splits**:\n\n| Split | Examples |\n|----------------|----------|\n| `'test'` | 2,448 |\n| `'train'` | 6,500 |\n| `'validation'` | 500 |\n\n- **Feature structure**:\n\n FeaturesDict({\n 'entailment': ClassLabel(shape=(), dtype=int64, num_classes=2),\n 'hypothesis': Text(shape=(), dtype=string),\n 'id': int32,\n 'similarity': float32,\n 'text': Text(shape=(), dtype=string),\n })\n\n- **Feature documentation**:\n\n| Feature | Class | Shape | Dtype | Description |\n|------------|--------------|-------|---------|-------------|\n| | FeaturesDict | | | |\n| entailment | ClassLabel | | int64 | |\n| hypothesis | Text | | string | |\n| id | Tensor | | int32 | |\n| similarity | Tensor | | float32 | |\n| text | Text | | string | |\n\n- **Supervised keys** (See\n [`as_supervised` doc](https://www.tensorflow.org/datasets/api_docs/python/tfds/load#args)):\n `None`\n\n- **Figure**\n ([tfds.show_examples](https://www.tensorflow.org/datasets/api_docs/python/tfds/visualization/show_examples)):\n Not supported.\n\n- **Examples**\n ([tfds.as_dataframe](https://www.tensorflow.org/datasets/api_docs/python/tfds/as_dataframe)):\n\nDisplay examples... \n\n- **Citation**:\n\n @inproceedings{DBLP:conf/propor/RealFO20,\n author = {Livy Real and\n Erick Fonseca and\n Hugo Gon{\\c{c} }alo Oliveira},\n editor = {Paulo Quaresma and\n Renata Vieira and\n Sandra M. Alu{\\'{\\i} }sio and\n Helena Moniz and\n Fernando Batista and\n Teresa Gon{\\c{c} }alves},\n title = {The {ASSIN} 2 Shared Task: {A} Quick Overview},\n booktitle = {Computational Processing of the Portuguese Language - 14th International\n Conference, {PROPOR} 2020, Evora, Portugal, March 2-4, 2020, Proceedings},\n series = {Lecture Notes in Computer Science},\n volume = {12037},\n pages = {406--412},\n publisher = {Springer},\n year = {2020},\n url = {https://doi.org/10.1007/978-3-030-41505-1_39},\n doi = {10.1007/978-3-030-41505-1_39},\n timestamp = {Tue, 03 Mar 2020 09:40:18 +0100},\n biburl = {https://dblp.org/rec/conf/propor/RealFO20.bib},\n bibsource = {dblp computer science bibliography, https://dblp.org}\n }"]]