- Description:
Bot Adversarial Dialogue Dataset.
Dialogue datasets labeled with offensiveness from Bot Adversarial Dialogue task. The dialogues were collected by asking humans to adversarially talk to bots.
More details in the paper.
Homepage: https://github.com/facebookresearch/ParlAI/tree/main/parlai/tasks/bot_adversarial_dialogue
Source code:
tfds.datasets.bot_adversarial_dialogue.BuilderVersions:
1.0.0(default): Initial release.
Auto-cached (documentation): Yes
Supervised keys (See
as_superviseddoc):NoneFigure (tfds.show_examples): Not supported.
Citation:
@misc{xu2021recipes,
title={Recipes for Safety in Open-domain Chatbots},
author={Jing Xu and Da Ju and Margaret Li and Y-Lan Boureau and Jason Weston and Emily Dinan},
year={2021},
eprint={2010.07079},
archivePrefix={arXiv},
primaryClass={cs.CL}
}
bot_adversarial_dialogue/dialogue_datasets (default config)
Config description: The dialogue datasets, divided in train, validation and test splits.
Download size:
3.06 MiBDataset size:
23.38 MiBSplits:
| Split | Examples |
|---|---|
'test' |
2,598 |
'train' |
69,274 |
'valid' |
7,002 |
- Feature structure:
FeaturesDict({
'bot_persona': Sequence(Text(shape=(), dtype=string)),
'dialogue_id': float32,
'episode_done': bool,
'id': Text(shape=(), dtype=string),
'labels': ClassLabel(shape=(), dtype=int64, num_classes=2),
'round_id': float32,
'speaker_to_eval': Text(shape=(), dtype=string),
'text': Text(shape=(), dtype=string),
})
- Feature documentation:
| Feature | Class | Shape | Dtype | Description |
|---|---|---|---|---|
| FeaturesDict | ||||
| bot_persona | Sequence(Text) | (None,) | string | The persona impersonated by the bot. |
| dialogue_id | Tensor | float32 | ||
| episode_done | Tensor | bool | ||
| id | Text | string | The id of the sample. | |
| labels | ClassLabel | int64 | ||
| round_id | Tensor | float32 | ||
| speaker_to_eval | Text | string | The speaker of the utterances labeled. | |
| text | Text | string | The utterance to classify. |
- Examples (tfds.as_dataframe):
bot_adversarial_dialogue/human_nonadv_safety_eval
Config description: An human safety evaluation set evaluated by crowdsourced workers for offensiveness.
Download size:
10.57 KiBDataset size:
34.55 KiBSplits:
| Split | Examples |
|---|---|
'test' |
180 |
- Feature structure:
FeaturesDict({
'episode_done': bool,
'id': Text(shape=(), dtype=string),
'labels': ClassLabel(shape=(), dtype=int64, num_classes=2),
'text': Text(shape=(), dtype=string),
})
- Feature documentation:
| Feature | Class | Shape | Dtype | Description |
|---|---|---|---|---|
| FeaturesDict | ||||
| episode_done | Tensor | bool | ||
| id | Text | string | The id of the sample. | |
| labels | ClassLabel | int64 | ||
| text | Text | string | The utterance to classify. |
- Examples (tfds.as_dataframe):