schema_guided_dialogue
Stay organized with collections
Save and categorize content based on your preferences.
The Schema-Guided Dialogue (SGD) dataset consists of over 20k annotated
multi-domain, task-oriented conversations between a human and a virtual
assistant. These conversations involve interactions with services and APIs
spanning 20 domains, ranging from banks and events to media, calendar, travel,
and weather. For most of these domains, the dataset contains multiple different
APIs, many of which have overlapping functionalities but different interfaces,
which reflects common real-world scenarios. The wide range of available
annotations can be used for intent prediction, slot filling, dialogue state
tracking, policy imitation learning, language generation, user simulation
learning, among other tasks in large-scale virtual assistants. Besides these,
the dataset has unseen domains and services in the evaluation set to quantify
the performance in zero-shot or few shot settings.
Split |
Examples |
'dev' |
2,482 |
'test' |
4,201 |
'train' |
16,142 |
FeaturesDict({
'first_speaker': ClassLabel(shape=(), dtype=int64, num_classes=2),
'metadata': FeaturesDict({
'services': Sequence({
'name': string,
}),
}),
'utterances': Sequence(Text(shape=(), dtype=string)),
})
Feature |
Class |
Shape |
Dtype |
Description |
|
FeaturesDict |
|
|
|
first_speaker |
ClassLabel |
|
int64 |
|
metadata |
FeaturesDict |
|
|
|
metadata/services |
Sequence |
|
|
|
metadata/services/name |
Tensor |
|
string |
|
utterances |
Sequence(Text) |
(None,) |
string |
|
@article{rastogi2019towards,
title={Towards Scalable Multi-domain Conversational Agents: The Schema-Guided Dialogue Dataset},
author={Rastogi, Abhinav and Zang, Xiaoxue and Sunkara, Srinivas and Gupta, Raghav and Khaitan, Pranav},
journal={arXiv preprint arXiv:1909.05855},
year={2019}
}
Except as otherwise noted, the content of this page is licensed under the Creative Commons Attribution 4.0 License, and code samples are licensed under the Apache 2.0 License. For details, see the Google Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.
Last updated 2022-12-23 UTC.
[[["Easy to understand","easyToUnderstand","thumb-up"],["Solved my problem","solvedMyProblem","thumb-up"],["Other","otherUp","thumb-up"]],[["Missing the information I need","missingTheInformationINeed","thumb-down"],["Too complicated / too many steps","tooComplicatedTooManySteps","thumb-down"],["Out of date","outOfDate","thumb-down"],["Samples / code issue","samplesCodeIssue","thumb-down"],["Other","otherDown","thumb-down"]],["Last updated 2022-12-23 UTC."],[],[],null,["# schema_guided_dialogue\n\n\u003cbr /\u003e\n\n- **Description**:\n\nThe Schema-Guided Dialogue (SGD) dataset consists of over 20k annotated\nmulti-domain, task-oriented conversations between a human and a virtual\nassistant. These conversations involve interactions with services and APIs\nspanning 20 domains, ranging from banks and events to media, calendar, travel,\nand weather. For most of these domains, the dataset contains multiple different\nAPIs, many of which have overlapping functionalities but different interfaces,\nwhich reflects common real-world scenarios. The wide range of available\nannotations can be used for intent prediction, slot filling, dialogue state\ntracking, policy imitation learning, language generation, user simulation\nlearning, among other tasks in large-scale virtual assistants. Besides these,\nthe dataset has unseen domains and services in the evaluation set to quantify\nthe performance in zero-shot or few shot settings.\n\n- **Additional Documentation** :\n [Explore on Papers With Code\n north_east](https://paperswithcode.com/dataset/sgd)\n\n- **Homepage** :\n \u003chttps://github.com/google-research-datasets/dstc8-schema-guided-dialogue\u003e\n\n- **Source code** :\n [`tfds.datasets.schema_guided_dialogue.Builder`](https://github.com/tensorflow/datasets/tree/master/tensorflow_datasets/datasets/schema_guided_dialogue/schema_guided_dialogue_dataset_builder.py)\n\n- **Versions**:\n\n - **`1.0.0`** (default): Initial release.\n- **Download size** : `35.12 MiB`\n\n- **Dataset size** : `25.36 MiB`\n\n- **Auto-cached**\n ([documentation](https://www.tensorflow.org/datasets/performances#auto-caching)):\n Yes\n\n- **Splits**:\n\n| Split | Examples |\n|-----------|----------|\n| `'dev'` | 2,482 |\n| `'test'` | 4,201 |\n| `'train'` | 16,142 |\n\n- **Feature structure**:\n\n FeaturesDict({\n 'first_speaker': ClassLabel(shape=(), dtype=int64, num_classes=2),\n 'metadata': FeaturesDict({\n 'services': Sequence({\n 'name': string,\n }),\n }),\n 'utterances': Sequence(Text(shape=(), dtype=string)),\n })\n\n- **Feature documentation**:\n\n| Feature | Class | Shape | Dtype | Description |\n|------------------------|----------------|---------|--------|-------------|\n| | FeaturesDict | | | |\n| first_speaker | ClassLabel | | int64 | |\n| metadata | FeaturesDict | | | |\n| metadata/services | Sequence | | | |\n| metadata/services/name | Tensor | | string | |\n| utterances | Sequence(Text) | (None,) | string | |\n\n- **Supervised keys** (See\n [`as_supervised` doc](https://www.tensorflow.org/datasets/api_docs/python/tfds/load#args)):\n `None`\n\n- **Figure**\n ([tfds.show_examples](https://www.tensorflow.org/datasets/api_docs/python/tfds/visualization/show_examples)):\n Not supported.\n\n- **Examples**\n ([tfds.as_dataframe](https://www.tensorflow.org/datasets/api_docs/python/tfds/as_dataframe)):\n\nDisplay examples... \n\n- **Citation**:\n\n @article{rastogi2019towards,\n title={Towards Scalable Multi-domain Conversational Agents: The Schema-Guided Dialogue Dataset},\n author={Rastogi, Abhinav and Zang, Xiaoxue and Sunkara, Srinivas and Gupta, Raghav and Khaitan, Pranav},\n journal={arXiv preprint arXiv:1909.05855},\n year={2019}\n }"]]