TFDS now supports the Croissant 🥐 format! Read the documentation to know more.

schema_guided_dialogue

Description:

The Schema-Guided Dialogue (SGD) dataset consists of over 20k annotated multi-domain, task-oriented conversations between a human and a virtual assistant. These conversations involve interactions with services and APIs spanning 20 domains, ranging from banks and events to media, calendar, travel, and weather. For most of these domains, the dataset contains multiple different APIs, many of which have overlapping functionalities but different interfaces, which reflects common real-world scenarios. The wide range of available annotations can be used for intent prediction, slot filling, dialogue state tracking, policy imitation learning, language generation, user simulation learning, among other tasks in large-scale virtual assistants. Besides these, the dataset has unseen domains and services in the evaluation set to quantify the performance in zero-shot or few shot settings.

Additional Documentation: Explore on Papers With Code
Homepage: https://github.com/google-research-datasets/dstc8-schema-guided-dialogue
Source code: tfds.datasets.schema_guided_dialogue.Builder
Versions:
- 1.0.0 (default): Initial release.
Download size: 35.12 MiB
Dataset size: 25.36 MiB
Auto-cached (documentation): Yes
Splits:

Split	Examples
`'dev'`	2,482
`'test'`	4,201
`'train'`	16,142

Feature structure:

FeaturesDict({
    'first_speaker': ClassLabel(shape=(), dtype=int64, num_classes=2),
    'metadata': FeaturesDict({
        'services': Sequence({
            'name': string,
        }),
    }),
    'utterances': Sequence(Text(shape=(), dtype=string)),
})

Feature documentation:

Feature	Class	Shape	Dtype
	FeaturesDict
first_speaker	ClassLabel		int64
metadata	FeaturesDict
metadata/services	Sequence
metadata/services/name	Tensor		string
utterances	Sequence(Text)	(None,)	string

Supervised keys (See as_supervised doc): None
Figure (tfds.show_examples): Not supported.
Examples (tfds.as_dataframe):

Citation:

@article{rastogi2019towards,
  title={Towards Scalable Multi-domain Conversational Agents: The Schema-Guided Dialogue Dataset},
  author={Rastogi, Abhinav and Zang, Xiaoxue and Sunkara, Srinivas and Gupta, Raghav and Khaitan, Pranav},
  journal={arXiv preprint arXiv:1909.05855},
  year={2019}
}

schema_guided_dialogue Stay organized with collections Save and categorize content based on your preferences.

schema_guided_dialogue