imdb_reviews
Stay organized with collections
Save and categorize content based on your preferences.
Large Movie Review Dataset. This is a dataset for binary sentiment
classification containing substantially more data than previous benchmark
datasets. We provide a set of 25,000 highly polar movie reviews for training,
and 25,000 for testing. There is additional unlabeled data for use as well.
Split |
Examples |
'test' |
25,000 |
'train' |
25,000 |
'unsupervised' |
50,000 |
FeaturesDict({
'label': ClassLabel(shape=(), dtype=int64, num_classes=2),
'text': Text(shape=(), dtype=string),
})
Feature |
Class |
Shape |
Dtype |
Description |
|
FeaturesDict |
|
|
|
label |
ClassLabel |
|
int64 |
|
text |
Text |
|
string |
|
@InProceedings{maas-EtAl:2011:ACL-HLT2011,
author = {Maas, Andrew L. and Daly, Raymond E. and Pham, Peter T. and Huang, Dan and Ng, Andrew Y. and Potts, Christopher},
title = {Learning Word Vectors for Sentiment Analysis},
booktitle = {Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies},
month = {June},
year = {2011},
address = {Portland, Oregon, USA},
publisher = {Association for Computational Linguistics},
pages = {142--150},
url = {http://www.aclweb.org/anthology/P11-1015}
}
imdb_reviews/plain_text (default config)
Except as otherwise noted, the content of this page is licensed under the Creative Commons Attribution 4.0 License, and code samples are licensed under the Apache 2.0 License. For details, see the Google Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.
Last updated 2024-09-20 UTC.
[[["Easy to understand","easyToUnderstand","thumb-up"],["Solved my problem","solvedMyProblem","thumb-up"],["Other","otherUp","thumb-up"]],[["Missing the information I need","missingTheInformationINeed","thumb-down"],["Too complicated / too many steps","tooComplicatedTooManySteps","thumb-down"],["Out of date","outOfDate","thumb-down"],["Samples / code issue","samplesCodeIssue","thumb-down"],["Other","otherDown","thumb-down"]],["Last updated 2024-09-20 UTC."],[],[],null,["# imdb_reviews\n\n\u003cbr /\u003e\n\n- **Description**:\n\nLarge Movie Review Dataset. This is a dataset for binary sentiment\nclassification containing substantially more data than previous benchmark\ndatasets. We provide a set of 25,000 highly polar movie reviews for training,\nand 25,000 for testing. There is additional unlabeled data for use as well.\n\n- **Additional Documentation** :\n [Explore on Papers With Code\n north_east](https://paperswithcode.com/dataset/imdb-movie-reviews)\n\n- **Config description**: Plain text\n\n- **Homepage** :\n [http://ai.stanford.edu/\\~amaas/data/sentiment/](http://ai.stanford.edu/%7Eamaas/data/sentiment/)\n\n- **Source code** :\n [`tfds.datasets.imdb_reviews.Builder`](https://github.com/tensorflow/datasets/tree/master/tensorflow_datasets/datasets/imdb_reviews/imdb_reviews_dataset_builder.py)\n\n- **Versions**:\n\n - **`1.0.0`** (default): New split API (\u003chttps://tensorflow.org/datasets/splits\u003e)\n- **Download size** : `80.23 MiB`\n\n- **Dataset size** : `129.83 MiB`\n\n- **Auto-cached**\n ([documentation](https://www.tensorflow.org/datasets/performances#auto-caching)):\n Yes\n\n- **Splits**:\n\n| Split | Examples |\n|------------------|----------|\n| `'test'` | 25,000 |\n| `'train'` | 25,000 |\n| `'unsupervised'` | 50,000 |\n\n- **Feature structure**:\n\n FeaturesDict({\n 'label': ClassLabel(shape=(), dtype=int64, num_classes=2),\n 'text': Text(shape=(), dtype=string),\n })\n\n- **Feature documentation**:\n\n| Feature | Class | Shape | Dtype | Description |\n|---------|--------------|-------|--------|-------------|\n| | FeaturesDict | | | |\n| label | ClassLabel | | int64 | |\n| text | Text | | string | |\n\n- **Supervised keys** (See\n [`as_supervised` doc](https://www.tensorflow.org/datasets/api_docs/python/tfds/load#args)):\n `('text', 'label')`\n\n- **Figure**\n ([tfds.show_examples](https://www.tensorflow.org/datasets/api_docs/python/tfds/visualization/show_examples)):\n Not supported.\n\n- **Examples**\n ([tfds.as_dataframe](https://www.tensorflow.org/datasets/api_docs/python/tfds/as_dataframe)):\n\nDisplay examples... \n\n- **Citation**:\n\n @InProceedings{maas-EtAl:2011:ACL-HLT2011,\n author = {Maas, Andrew L. and Daly, Raymond E. and Pham, Peter T. and Huang, Dan and Ng, Andrew Y. and Potts, Christopher},\n title = {Learning Word Vectors for Sentiment Analysis},\n booktitle = {Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies},\n month = {June},\n year = {2011},\n address = {Portland, Oregon, USA},\n publisher = {Association for Computational Linguistics},\n pages = {142--150},\n url = {http://www.aclweb.org/anthology/P11-1015}\n }\n\nimdb_reviews/plain_text (default config)\n----------------------------------------"]]