multi_nli
Stay organized with collections
Save and categorize content based on your preferences.
The Multi-Genre Natural Language Inference (MultiNLI) corpus is a crowd-sourced
collection of 433k sentence pairs annotated with textual entailment information.
The corpus is modeled on the SNLI corpus, but differs in that covers a range of
genres of spoken and written text, and supports a distinctive cross-genre
generalization evaluation. The corpus served as the basis for the shared task of
the RepEval 2017 Workshop at EMNLP in Copenhagen.
Split |
Examples |
'train' |
392,702 |
'validation_matched' |
9,815 |
'validation_mismatched' |
9,832 |
FeaturesDict({
'hypothesis': Text(shape=(), dtype=string),
'label': ClassLabel(shape=(), dtype=int64, num_classes=3),
'premise': Text(shape=(), dtype=string),
})
Feature |
Class |
Shape |
Dtype |
Description |
|
FeaturesDict |
|
|
|
hypothesis |
Text |
|
string |
|
label |
ClassLabel |
|
int64 |
|
premise |
Text |
|
string |
|
@InProceedings{N18-1101,
author = "Williams, Adina
and Nangia, Nikita
and Bowman, Samuel",
title = "A Broad-Coverage Challenge Corpus for
Sentence Understanding through Inference",
booktitle = "Proceedings of the 2018 Conference of
the North American Chapter of the
Association for Computational Linguistics:
Human Language Technologies, Volume 1 (Long
Papers)",
year = "2018",
publisher = "Association for Computational Linguistics",
pages = "1112--1122",
location = "New Orleans, Louisiana",
url = "http://aclweb.org/anthology/N18-1101"
}
Except as otherwise noted, the content of this page is licensed under the Creative Commons Attribution 4.0 License, and code samples are licensed under the Apache 2.0 License. For details, see the Google Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.
Last updated 2022-12-06 UTC.
[[["Easy to understand","easyToUnderstand","thumb-up"],["Solved my problem","solvedMyProblem","thumb-up"],["Other","otherUp","thumb-up"]],[["Missing the information I need","missingTheInformationINeed","thumb-down"],["Too complicated / too many steps","tooComplicatedTooManySteps","thumb-down"],["Out of date","outOfDate","thumb-down"],["Samples / code issue","samplesCodeIssue","thumb-down"],["Other","otherDown","thumb-down"]],["Last updated 2022-12-06 UTC."],[],[],null,["# multi_nli\n\n\u003cbr /\u003e\n\n- **Description**:\n\nThe Multi-Genre Natural Language Inference (MultiNLI) corpus is a crowd-sourced\ncollection of 433k sentence pairs annotated with textual entailment information.\nThe corpus is modeled on the SNLI corpus, but differs in that covers a range of\ngenres of spoken and written text, and supports a distinctive cross-genre\ngeneralization evaluation. The corpus served as the basis for the shared task of\nthe RepEval 2017 Workshop at EMNLP in Copenhagen.\n\n- **Additional Documentation** :\n [Explore on Papers With Code\n north_east](https://paperswithcode.com/dataset/multinli)\n\n- **Homepage** :\n \u003chttps://www.nyu.edu/projects/bowman/multinli/\u003e\n\n- **Source code** :\n [`tfds.text.MultiNLI`](https://github.com/tensorflow/datasets/tree/master/tensorflow_datasets/text/multi_nli.py)\n\n- **Versions**:\n\n - **`1.1.0`** (default): No release notes.\n- **Download size** : `216.34 MiB`\n\n- **Dataset size** : `89.50 MiB`\n\n- **Auto-cached**\n ([documentation](https://www.tensorflow.org/datasets/performances#auto-caching)):\n Yes\n\n- **Splits**:\n\n| Split | Examples |\n|---------------------------|----------|\n| `'train'` | 392,702 |\n| `'validation_matched'` | 9,815 |\n| `'validation_mismatched'` | 9,832 |\n\n- **Feature structure**:\n\n FeaturesDict({\n 'hypothesis': Text(shape=(), dtype=string),\n 'label': ClassLabel(shape=(), dtype=int64, num_classes=3),\n 'premise': Text(shape=(), dtype=string),\n })\n\n- **Feature documentation**:\n\n| Feature | Class | Shape | Dtype | Description |\n|------------|--------------|-------|--------|-------------|\n| | FeaturesDict | | | |\n| hypothesis | Text | | string | |\n| label | ClassLabel | | int64 | |\n| premise | Text | | string | |\n\n- **Supervised keys** (See\n [`as_supervised` doc](https://www.tensorflow.org/datasets/api_docs/python/tfds/load#args)):\n `None`\n\n- **Figure**\n ([tfds.show_examples](https://www.tensorflow.org/datasets/api_docs/python/tfds/visualization/show_examples)):\n Not supported.\n\n- **Examples**\n ([tfds.as_dataframe](https://www.tensorflow.org/datasets/api_docs/python/tfds/as_dataframe)):\n\nDisplay examples... \n\n- **Citation**:\n\n @InProceedings{N18-1101,\n author = \"Williams, Adina\n and Nangia, Nikita\n and Bowman, Samuel\",\n title = \"A Broad-Coverage Challenge Corpus for\n Sentence Understanding through Inference\",\n booktitle = \"Proceedings of the 2018 Conference of\n the North American Chapter of the\n Association for Computational Linguistics:\n Human Language Technologies, Volume 1 (Long\n Papers)\",\n year = \"2018\",\n publisher = \"Association for Computational Linguistics\",\n pages = \"1112--1122\",\n location = \"New Orleans, Louisiana\",\n url = \"http://aclweb.org/anthology/N18-1101\"\n }"]]