doc_nli
Stay organized with collections
Save and categorize content based on your preferences.
DocNLI is a large-scale dataset for document-level natural language inference
(NLI). DocNLI is transformed from a broad range of NLP problems and covers
multiple genres of text. The premises always stay in the document granularity,
whereas the hypotheses vary in length from single sentences to passages with
hundreds of words. In contrast to some existing sentence-level NLI datasets,
DocNLI has pretty limited artifacts.
Split |
Examples |
'test' |
267,086 |
'train' |
942,314 |
'validation' |
234,258 |
FeaturesDict({
'hypothesis': Text(shape=(), dtype=string),
'label': ClassLabel(shape=(), dtype=int64, num_classes=2),
'premise': Text(shape=(), dtype=string),
})
Feature |
Class |
Shape |
Dtype |
Description |
|
FeaturesDict |
|
|
|
hypothesis |
Text |
|
string |
|
label |
ClassLabel |
|
int64 |
|
premise |
Text |
|
string |
|
@inproceedings{yin-etal-2021-docnli,
title={DocNLI: A Large-scale Dataset for Document-level Natural Language Inference},
author={Wenpeng Yin and Dragomir Radev and Caiming Xiong},
booktitle = "Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021",
month = aug,
year = "2021",
address = "Bangkok, Thailand",
publisher = "Association for Computational Linguistics",
}
Except as otherwise noted, the content of this page is licensed under the Creative Commons Attribution 4.0 License, and code samples are licensed under the Apache 2.0 License. For details, see the Google Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.
Last updated 2022-12-06 UTC.
[[["Easy to understand","easyToUnderstand","thumb-up"],["Solved my problem","solvedMyProblem","thumb-up"],["Other","otherUp","thumb-up"]],[["Missing the information I need","missingTheInformationINeed","thumb-down"],["Too complicated / too many steps","tooComplicatedTooManySteps","thumb-down"],["Out of date","outOfDate","thumb-down"],["Samples / code issue","samplesCodeIssue","thumb-down"],["Other","otherDown","thumb-down"]],["Last updated 2022-12-06 UTC."],[],[],null,["# doc_nli\n\n\u003cbr /\u003e\n\n- **Description**:\n\nDocNLI is a large-scale dataset for document-level natural language inference\n(NLI). DocNLI is transformed from a broad range of NLP problems and covers\nmultiple genres of text. The premises always stay in the document granularity,\nwhereas the hypotheses vary in length from single sentences to passages with\nhundreds of words. In contrast to some existing sentence-level NLI datasets,\nDocNLI has pretty limited artifacts.\n\n- **Additional Documentation** :\n [Explore on Papers With Code\n north_east](https://paperswithcode.com/dataset/docnli)\n\n- **Homepage** :\n \u003chttps://github.com/salesforce/DocNLI/\u003e\n\n- **Source code** :\n [`tfds.text.docnli.DocNLI`](https://github.com/tensorflow/datasets/tree/master/tensorflow_datasets/text/docnli/docnli.py)\n\n- **Versions**:\n\n - **`1.0.0`** (default): Initial release.\n- **Download size** : `313.89 MiB`\n\n- **Dataset size** : `3.07 GiB`\n\n- **Auto-cached**\n ([documentation](https://www.tensorflow.org/datasets/performances#auto-caching)):\n No\n\n- **Splits**:\n\n| Split | Examples |\n|----------------|----------|\n| `'test'` | 267,086 |\n| `'train'` | 942,314 |\n| `'validation'` | 234,258 |\n\n- **Feature structure**:\n\n FeaturesDict({\n 'hypothesis': Text(shape=(), dtype=string),\n 'label': ClassLabel(shape=(), dtype=int64, num_classes=2),\n 'premise': Text(shape=(), dtype=string),\n })\n\n- **Feature documentation**:\n\n| Feature | Class | Shape | Dtype | Description |\n|------------|--------------|-------|--------|-------------|\n| | FeaturesDict | | | |\n| hypothesis | Text | | string | |\n| label | ClassLabel | | int64 | |\n| premise | Text | | string | |\n\n- **Supervised keys** (See\n [`as_supervised` doc](https://www.tensorflow.org/datasets/api_docs/python/tfds/load#args)):\n `None`\n\n- **Figure**\n ([tfds.show_examples](https://www.tensorflow.org/datasets/api_docs/python/tfds/visualization/show_examples)):\n Not supported.\n\n- **Examples**\n ([tfds.as_dataframe](https://www.tensorflow.org/datasets/api_docs/python/tfds/as_dataframe)):\n\nDisplay examples... \n\n- **Citation**:\n\n @inproceedings{yin-etal-2021-docnli,\n title={DocNLI: A Large-scale Dataset for Document-level Natural Language Inference},\n author={Wenpeng Yin and Dragomir Radev and Caiming Xiong},\n booktitle = \"Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021\",\n month = aug,\n year = \"2021\",\n address = \"Bangkok, Thailand\",\n publisher = \"Association for Computational Linguistics\",\n }"]]