bot_adversarial_dialogue
Stay organized with collections
Save and categorize content based on your preferences.
Bot Adversarial Dialogue Dataset.
Dialogue datasets labeled with offensiveness from Bot Adversarial Dialogue task.
The dialogues were collected by asking humans to adversarially talk to bots.
More details in the paper.
@misc{xu2021recipes,
title={Recipes for Safety in Open-domain Chatbots},
author={Jing Xu and Da Ju and Margaret Li and Y-Lan Boureau and Jason Weston and Emily Dinan},
year={2021},
eprint={2010.07079},
archivePrefix={arXiv},
primaryClass={cs.CL}
}
bot_adversarial_dialogue/dialogue_datasets (default config)
Split |
Examples |
'test' |
2,598 |
'train' |
69,274 |
'valid' |
7,002 |
FeaturesDict({
'bot_persona': Sequence(Text(shape=(), dtype=string)),
'dialogue_id': float32,
'episode_done': bool,
'id': Text(shape=(), dtype=string),
'labels': ClassLabel(shape=(), dtype=int64, num_classes=2),
'round_id': float32,
'speaker_to_eval': Text(shape=(), dtype=string),
'text': Text(shape=(), dtype=string),
})
Feature |
Class |
Shape |
Dtype |
Description |
|
FeaturesDict |
|
|
|
bot_persona
|
Sequence(Text)
|
(None,)
|
string
|
The persona
impersonated
by the bot. |
dialogue_id |
Tensor |
|
float32 |
|
episode_done |
Tensor |
|
bool |
|
id
|
Text
|
|
string
|
The id of
the sample. |
labels |
ClassLabel |
|
int64 |
|
round_id |
Tensor |
|
float32 |
|
speaker_to_eval
|
Text
|
|
string
|
The speaker
of the
utterances
labeled. |
text
|
Text
|
|
string
|
The
utterance to
classify. |
bot_adversarial_dialogue/human_nonadv_safety_eval
Split |
Examples |
'test' |
180 |
FeaturesDict({
'episode_done': bool,
'id': Text(shape=(), dtype=string),
'labels': ClassLabel(shape=(), dtype=int64, num_classes=2),
'text': Text(shape=(), dtype=string),
})
Feature |
Class |
Shape |
Dtype |
Description |
|
FeaturesDict |
|
|
|
episode_done |
Tensor |
|
bool |
|
id |
Text |
|
string |
The id of the sample. |
labels |
ClassLabel |
|
int64 |
|
text |
Text |
|
string |
The utterance to classify. |
Except as otherwise noted, the content of this page is licensed under the Creative Commons Attribution 4.0 License, and code samples are licensed under the Apache 2.0 License. For details, see the Google Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.
Last updated 2023-09-09 UTC.
[[["Easy to understand","easyToUnderstand","thumb-up"],["Solved my problem","solvedMyProblem","thumb-up"],["Other","otherUp","thumb-up"]],[["Missing the information I need","missingTheInformationINeed","thumb-down"],["Too complicated / too many steps","tooComplicatedTooManySteps","thumb-down"],["Out of date","outOfDate","thumb-down"],["Samples / code issue","samplesCodeIssue","thumb-down"],["Other","otherDown","thumb-down"]],["Last updated 2023-09-09 UTC."],[],[],null,["# bot_adversarial_dialogue\n\n\u003cbr /\u003e\n\n- **Description**:\n\nBot Adversarial Dialogue Dataset.\n=================================\n\nDialogue datasets labeled with offensiveness from Bot Adversarial Dialogue task.\nThe dialogues were collected by asking humans to adversarially talk to bots.\n\nMore details in the [paper](https://arxiv.org/abs/2010.07079).\n\n- **Homepage** :\n \u003chttps://github.com/facebookresearch/ParlAI/tree/main/parlai/tasks/bot_adversarial_dialogue\u003e\n\n- **Source code** :\n [`tfds.datasets.bot_adversarial_dialogue.Builder`](https://github.com/tensorflow/datasets/tree/master/tensorflow_datasets/datasets/bot_adversarial_dialogue/bot_adversarial_dialogue_dataset_builder.py)\n\n- **Versions**:\n\n - **`1.0.0`** (default): Initial release.\n- **Auto-cached**\n ([documentation](https://www.tensorflow.org/datasets/performances#auto-caching)):\n Yes\n\n- **Supervised keys** (See\n [`as_supervised` doc](https://www.tensorflow.org/datasets/api_docs/python/tfds/load#args)):\n `None`\n\n- **Figure**\n ([tfds.show_examples](https://www.tensorflow.org/datasets/api_docs/python/tfds/visualization/show_examples)):\n Not supported.\n\n- **Citation**:\n\n @misc{xu2021recipes,\n title={Recipes for Safety in Open-domain Chatbots},\n author={Jing Xu and Da Ju and Margaret Li and Y-Lan Boureau and Jason Weston and Emily Dinan},\n year={2021},\n eprint={2010.07079},\n archivePrefix={arXiv},\n primaryClass={cs.CL}\n }\n\nbot_adversarial_dialogue/dialogue_datasets (default config)\n-----------------------------------------------------------\n\n- **Config description**: The dialogue datasets, divided in train, validation\n and test splits.\n\n- **Download size** : `3.06 MiB`\n\n- **Dataset size** : `23.38 MiB`\n\n- **Splits**:\n\n| Split | Examples |\n|-----------|----------|\n| `'test'` | 2,598 |\n| `'train'` | 69,274 |\n| `'valid'` | 7,002 |\n\n- **Feature structure**:\n\n FeaturesDict({\n 'bot_persona': Sequence(Text(shape=(), dtype=string)),\n 'dialogue_id': float32,\n 'episode_done': bool,\n 'id': Text(shape=(), dtype=string),\n 'labels': ClassLabel(shape=(), dtype=int64, num_classes=2),\n 'round_id': float32,\n 'speaker_to_eval': Text(shape=(), dtype=string),\n 'text': Text(shape=(), dtype=string),\n })\n\n- **Feature documentation**:\n\n| Feature | Class | Shape | Dtype | Description |\n|-----------------|----------------|---------|---------|----------------------------------------|\n| | FeaturesDict | | | |\n| bot_persona | Sequence(Text) | (None,) | string | The persona impersonated by the bot. |\n| dialogue_id | Tensor | | float32 | |\n| episode_done | Tensor | | bool | |\n| id | Text | | string | The id of the sample. |\n| labels | ClassLabel | | int64 | |\n| round_id | Tensor | | float32 | |\n| speaker_to_eval | Text | | string | The speaker of the utterances labeled. |\n| text | Text | | string | The utterance to classify. |\n\n- **Examples** ([tfds.as_dataframe](https://www.tensorflow.org/datasets/api_docs/python/tfds/as_dataframe)):\n\nDisplay examples... \n\nbot_adversarial_dialogue/human_nonadv_safety_eval\n-------------------------------------------------\n\n- **Config description**: An human safety evaluation set evaluated by\n crowdsourced workers for offensiveness.\n\n- **Download size** : `10.57 KiB`\n\n- **Dataset size** : `34.55 KiB`\n\n- **Splits**:\n\n| Split | Examples |\n|----------|----------|\n| `'test'` | 180 |\n\n- **Feature structure**:\n\n FeaturesDict({\n 'episode_done': bool,\n 'id': Text(shape=(), dtype=string),\n 'labels': ClassLabel(shape=(), dtype=int64, num_classes=2),\n 'text': Text(shape=(), dtype=string),\n })\n\n- **Feature documentation**:\n\n| Feature | Class | Shape | Dtype | Description |\n|--------------|--------------|-------|--------|----------------------------|\n| | FeaturesDict | | | |\n| episode_done | Tensor | | bool | |\n| id | Text | | string | The id of the sample. |\n| labels | ClassLabel | | int64 | |\n| text | Text | | string | The utterance to classify. |\n\n- **Examples** ([tfds.as_dataframe](https://www.tensorflow.org/datasets/api_docs/python/tfds/as_dataframe)):\n\nDisplay examples..."]]