databricks_dolly
Stay organized with collections
Save and categorize content based on your preferences.
databricks-dolly-15k
is an open source dataset of instruction-following
records used in training
databricks/dolly-v2-12b that
was generated by thousands of Databricks employees in several of the behavioral
categories outlined in the InstructGPT
paper, including brainstorming, classification, closed QA, generation,
information extraction, open QA, and summarization.
This dataset can be used for any purpose, whether academic or commercial, under
the terms of the
Creative Commons Attribution-ShareAlike 3.0 Unported License.
Split |
Examples |
'train' |
15,014 |
FeaturesDict({
'category': Text(shape=(), dtype=string),
'context': Text(shape=(), dtype=string),
'instruction': Text(shape=(), dtype=string),
'response': Text(shape=(), dtype=string),
})
Feature |
Class |
Shape |
Dtype |
Description |
|
FeaturesDict |
|
|
|
category |
Text |
|
string |
|
context |
Text |
|
string |
|
instruction |
Text |
|
string |
|
response |
Text |
|
string |
|
Except as otherwise noted, the content of this page is licensed under the Creative Commons Attribution 4.0 License, and code samples are licensed under the Apache 2.0 License. For details, see the Google Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.
Last updated 2023-09-09 UTC.
[[["Easy to understand","easyToUnderstand","thumb-up"],["Solved my problem","solvedMyProblem","thumb-up"],["Other","otherUp","thumb-up"]],[["Missing the information I need","missingTheInformationINeed","thumb-down"],["Too complicated / too many steps","tooComplicatedTooManySteps","thumb-down"],["Out of date","outOfDate","thumb-down"],["Samples / code issue","samplesCodeIssue","thumb-down"],["Other","otherDown","thumb-down"]],["Last updated 2023-09-09 UTC."],[],[],null,["# databricks_dolly\n\n\u003cbr /\u003e\n\n- **Description**:\n\n`databricks-dolly-15k` is an open source dataset of instruction-following\nrecords used in training\n[databricks/dolly-v2-12b](https://huggingface.co/databricks/dolly-v2-12b) that\nwas generated by thousands of Databricks employees in several of the behavioral\ncategories outlined in the [InstructGPT](https://arxiv.org/abs/2203.02155)\npaper, including brainstorming, classification, closed QA, generation,\ninformation extraction, open QA, and summarization.\n\nThis dataset can be used for any purpose, whether academic or commercial, under\nthe terms of the\n[Creative Commons Attribution-ShareAlike 3.0 Unported License](https://creativecommons.org/licenses/by-sa/3.0/legalcode).\n\n- **Homepage** :\n \u003chttps://github.com/databrickslabs/dolly\u003e\n\n- **Source code** :\n [`tfds.datasets.databricks_dolly.Builder`](https://github.com/tensorflow/datasets/tree/master/tensorflow_datasets/datasets/databricks_dolly/databricks_dolly_dataset_builder.py)\n\n- **Versions**:\n\n - **`1.0.0`** (default): Initial release.\n- **Download size** : `12.60 MiB`\n\n- **Dataset size** : `12.69 MiB`\n\n- **Auto-cached**\n ([documentation](https://www.tensorflow.org/datasets/performances#auto-caching)):\n Yes\n\n- **Splits**:\n\n| Split | Examples |\n|-----------|----------|\n| `'train'` | 15,014 |\n\n- **Feature structure**:\n\n FeaturesDict({\n 'category': Text(shape=(), dtype=string),\n 'context': Text(shape=(), dtype=string),\n 'instruction': Text(shape=(), dtype=string),\n 'response': Text(shape=(), dtype=string),\n })\n\n- **Feature documentation**:\n\n| Feature | Class | Shape | Dtype | Description |\n|-------------|--------------|-------|--------|-------------|\n| | FeaturesDict | | | |\n| category | Text | | string | |\n| context | Text | | string | |\n| instruction | Text | | string | |\n| response | Text | | string | |\n\n- **Supervised keys** (See\n [`as_supervised` doc](https://www.tensorflow.org/datasets/api_docs/python/tfds/load#args)):\n `None`\n\n- **Figure**\n ([tfds.show_examples](https://www.tensorflow.org/datasets/api_docs/python/tfds/visualization/show_examples)):\n Not supported.\n\n- **Examples**\n ([tfds.as_dataframe](https://www.tensorflow.org/datasets/api_docs/python/tfds/as_dataframe)):\n\nDisplay examples... \n\n- **Citation**:"]]