clinc_oos
Stay organized with collections
Save and categorize content based on your preferences.
Task-oriented dialog systems need to know when a query falls outside their range
of supported intents, but current text classification corpora only define label
sets that cover every example. We introduce a new dataset that includes queries
that are out-of-scope (OOS), i.e., queries that do not fall into any of the
system's supported intents. This poses a new challenge because models cannot
assume that every query at inference time belongs to a system-supported intent
class. Our dataset also covers 150 intent classes over 10 domains, capturing the
breadth that a production task-oriented agent must handle. It offers a way of
more rigorously and realistically benchmarking text classification in
task-driven dialog systems.
Split |
Examples |
'test' |
4,500 |
'test_oos' |
1,000 |
'train' |
15,000 |
'train_oos' |
100 |
'validation' |
3,000 |
'validation_oos' |
100 |
FeaturesDict({
'domain': int32,
'domain_name': Text(shape=(), dtype=string),
'intent': int32,
'intent_name': Text(shape=(), dtype=string),
'text': Text(shape=(), dtype=string),
})
Feature |
Class |
Shape |
Dtype |
Description |
|
FeaturesDict |
|
|
|
domain |
Tensor |
|
int32 |
|
domain_name |
Text |
|
string |
|
intent |
Tensor |
|
int32 |
|
intent_name |
Text |
|
string |
|
text |
Text |
|
string |
|
@inproceedings{larson-etal-2019-evaluation,
title = "An Evaluation Dataset for Intent Classification and Out-of-Scope Prediction",
author = "Larson, Stefan and
Mahendran, Anish and
Peper, Joseph J. and
Clarke, Christopher and
Lee, Andrew and
Hill, Parker and
Kummerfeld, Jonathan K. and
Leach, Kevin and
Laurenzano, Michael A. and
Tang, Lingjia and
Mars, Jason",
booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)",
month = nov,
year = "2019",
address = "Hong Kong, China",
publisher = "Association for Computational Linguistics",
url = "https://www.aclweb.org/anthology/D19-1131",
doi = "10.18653/v1/D19-1131",
pages = "1311--1316",
}
Except as otherwise noted, the content of this page is licensed under the Creative Commons Attribution 4.0 License, and code samples are licensed under the Apache 2.0 License. For details, see the Google Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.
Last updated 2022-12-06 UTC.
[[["Easy to understand","easyToUnderstand","thumb-up"],["Solved my problem","solvedMyProblem","thumb-up"],["Other","otherUp","thumb-up"]],[["Missing the information I need","missingTheInformationINeed","thumb-down"],["Too complicated / too many steps","tooComplicatedTooManySteps","thumb-down"],["Out of date","outOfDate","thumb-down"],["Samples / code issue","samplesCodeIssue","thumb-down"],["Other","otherDown","thumb-down"]],["Last updated 2022-12-06 UTC."],[],[],null,["# clinc_oos\n\n\u003cbr /\u003e\n\n- **Description**:\n\nTask-oriented dialog systems need to know when a query falls outside their range\nof supported intents, but current text classification corpora only define label\nsets that cover every example. We introduce a new dataset that includes queries\nthat are out-of-scope (OOS), i.e., queries that do not fall into any of the\nsystem's supported intents. This poses a new challenge because models cannot\nassume that every query at inference time belongs to a system-supported intent\nclass. Our dataset also covers 150 intent classes over 10 domains, capturing the\nbreadth that a production task-oriented agent must handle. It offers a way of\nmore rigorously and realistically benchmarking text classification in\ntask-driven dialog systems.\n\n- **Additional Documentation** :\n [Explore on Papers With Code\n north_east](https://paperswithcode.com/dataset/clinc150)\n\n- **Homepage** :\n \u003chttps://github.com/clinc/oos-eval/\u003e\n\n- **Source code** :\n [`tfds.text.ClincOOS`](https://github.com/tensorflow/datasets/tree/master/tensorflow_datasets/text/clinc_oos.py)\n\n- **Versions**:\n\n - **`0.1.0`** (default): No release notes.\n- **Download size** : `256.01 KiB`\n\n- **Dataset size** : `3.40 MiB`\n\n- **Auto-cached**\n ([documentation](https://www.tensorflow.org/datasets/performances#auto-caching)):\n Yes\n\n- **Splits**:\n\n| Split | Examples |\n|--------------------|----------|\n| `'test'` | 4,500 |\n| `'test_oos'` | 1,000 |\n| `'train'` | 15,000 |\n| `'train_oos'` | 100 |\n| `'validation'` | 3,000 |\n| `'validation_oos'` | 100 |\n\n- **Feature structure**:\n\n FeaturesDict({\n 'domain': int32,\n 'domain_name': Text(shape=(), dtype=string),\n 'intent': int32,\n 'intent_name': Text(shape=(), dtype=string),\n 'text': Text(shape=(), dtype=string),\n })\n\n- **Feature documentation**:\n\n| Feature | Class | Shape | Dtype | Description |\n|-------------|--------------|-------|--------|-------------|\n| | FeaturesDict | | | |\n| domain | Tensor | | int32 | |\n| domain_name | Text | | string | |\n| intent | Tensor | | int32 | |\n| intent_name | Text | | string | |\n| text | Text | | string | |\n\n- **Supervised keys** (See\n [`as_supervised` doc](https://www.tensorflow.org/datasets/api_docs/python/tfds/load#args)):\n `('text', 'intent')`\n\n- **Figure**\n ([tfds.show_examples](https://www.tensorflow.org/datasets/api_docs/python/tfds/visualization/show_examples)):\n Not supported.\n\n- **Examples**\n ([tfds.as_dataframe](https://www.tensorflow.org/datasets/api_docs/python/tfds/as_dataframe)):\n\nDisplay examples... \n\n- **Citation**:\n\n @inproceedings{larson-etal-2019-evaluation,\n title = \"An Evaluation Dataset for Intent Classification and Out-of-Scope Prediction\",\n author = \"Larson, Stefan and\n Mahendran, Anish and\n Peper, Joseph J. and\n Clarke, Christopher and\n Lee, Andrew and\n Hill, Parker and\n Kummerfeld, Jonathan K. and\n Leach, Kevin and\n Laurenzano, Michael A. and\n Tang, Lingjia and\n Mars, Jason\",\n booktitle = \"Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)\",\n month = nov,\n year = \"2019\",\n address = \"Hong Kong, China\",\n publisher = \"Association for Computational Linguistics\",\n url = \"https://www.aclweb.org/anthology/D19-1131\",\n doi = \"10.18653/v1/D19-1131\",\n pages = \"1311--1316\",\n }"]]