- Description:
WikiDialog is a large dataset of synthetically generated information-seeking conversations. Each conversation in the dataset contains two speakers grounded in a passage from English Wikipedia: one speaker’s utterances consist of exact sentences from the passage; the other speaker is generated by a large language model.
Config description: WikiDialog generated from the dialog inpainter finetuned on OR-QuAC and QReCC.
OQstands for OR-QuAC and QReCC.Homepage: https://github.com/google-research/dialog-inpainting#wikidialog-oq
Source code:
tfds.text.wiki_dialog.WikiDialogVersions:
1.0.0(default): Initial release.
Download size:
7.04 GiBDataset size:
36.58 GiBAuto-cached (documentation): No
Splits:
| Split | Examples |
|---|---|
'train' |
11,264,129 |
'validation' |
113,822 |
- Feature structure:
FeaturesDict({
'author_num': Sequence(int32),
'passage': Text(shape=(), dtype=string),
'pid': Text(shape=(), dtype=string),
'sentences': Sequence(Text(shape=(), dtype=string)),
'title': Text(shape=(), dtype=string),
'utterances': Sequence(Text(shape=(), dtype=string)),
})
- Feature documentation:
| Feature | Class | Shape | Dtype | Description |
|---|---|---|---|---|
| FeaturesDict | ||||
| author_num | Sequence(Tensor) | (None,) | int32 | |
| passage | Text | string | ||
| pid | Text | string | ||
| sentences | Sequence(Text) | (None,) | string | |
| title | Text | string | ||
| utterances | Sequence(Text) | (None,) | string |
Supervised keys (See
as_superviseddoc):NoneFigure (tfds.show_examples): Not supported.
Examples (tfds.as_dataframe):
- Citation:
@inproceedings{dai2022dialoginpainting,
title={Dialog Inpainting: Turning Documents to Dialogs},
author={Dai, Zhuyun and Chaganty, Arun Tejasvi and Zhao, Vincent and Amini, Aida and Green, Mike and Rashid, Qazi and Guu, Kelvin},
booktitle={International Conference on Machine Learning (ICML)},
year={2022},
organization={PMLR}
}