yelp_polarity_reviews
Stay organized with collections
Save and categorize content based on your preferences.
Large Yelp Review Dataset. This is a dataset for binary sentiment
classification. We provide a set of 560,000 highly polar yelp reviews for
training, and 38,000 for testing. ORIGIN The Yelp reviews dataset consists of
reviews from Yelp. It is extracted from the Yelp Dataset Challenge 2015 data.
For more information, please refer to http://www.yelp.com/dataset
The Yelp reviews polarity dataset is constructed by Xiang Zhang
(xiang.zhang@nyu.edu) from the above dataset. It is first used as a text
classification benchmark in the following paper: Xiang Zhang, Junbo Zhao, Yann
LeCun. Character-level Convolutional Networks for Text Classification. Advances
in Neural Information Processing Systems 28 (NIPS 2015).
DESCRIPTION
The Yelp reviews polarity dataset is constructed by considering stars 1 and 2
negative, and 3 and 4 positive. For each polarity 280,000 training samples and
19,000 testing samples are take randomly. In total there are 560,000 trainig
samples and 38,000 testing samples. Negative polarity is class 1, and positive
class 2.
The files train.csv and test.csv contain all the training samples as
comma-sparated values. There are 2 columns in them, corresponding to class index
(1 and 2) and review text. The review texts are escaped using double quotes ("),
and any internal double quote is escaped by 2 double quotes (""). New lines are
escaped by a backslash followed with an "n" character, that is " ".
Split |
Examples |
'test' |
38,000 |
'train' |
560,000 |
FeaturesDict({
'label': ClassLabel(shape=(), dtype=int64, num_classes=2),
'text': Text(shape=(), dtype=string),
})
Feature |
Class |
Shape |
Dtype |
Description |
|
FeaturesDict |
|
|
|
label |
ClassLabel |
|
int64 |
|
text |
Text |
|
string |
|
@article{zhangCharacterlevelConvolutionalNetworks2015,
archivePrefix = {arXiv},
eprinttype = {arxiv},
eprint = {1509.01626},
primaryClass = {cs},
title = {Character-Level { {Convolutional Networks} } for { {Text Classification} } },
abstract = {This article offers an empirical exploration on the use of character-level convolutional networks (ConvNets) for text classification. We constructed several large-scale datasets to show that character-level convolutional networks could achieve state-of-the-art or competitive results. Comparisons are offered against traditional models such as bag of words, n-grams and their TFIDF variants, and deep learning models such as word-based ConvNets and recurrent neural networks.},
journal = {arXiv:1509.01626 [cs]},
author = {Zhang, Xiang and Zhao, Junbo and LeCun, Yann},
month = sep,
year = {2015},
}
Except as otherwise noted, the content of this page is licensed under the Creative Commons Attribution 4.0 License, and code samples are licensed under the Apache 2.0 License. For details, see the Google Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.
Last updated 2022-12-06 UTC.
[[["Easy to understand","easyToUnderstand","thumb-up"],["Solved my problem","solvedMyProblem","thumb-up"],["Other","otherUp","thumb-up"]],[["Missing the information I need","missingTheInformationINeed","thumb-down"],["Too complicated / too many steps","tooComplicatedTooManySteps","thumb-down"],["Out of date","outOfDate","thumb-down"],["Samples / code issue","samplesCodeIssue","thumb-down"],["Other","otherDown","thumb-down"]],["Last updated 2022-12-06 UTC."],[],[],null,["# yelp_polarity_reviews\n\n\u003cbr /\u003e\n\n- **Description**:\n\nLarge Yelp Review Dataset. This is a dataset for binary sentiment\nclassification. We provide a set of 560,000 highly polar yelp reviews for\ntraining, and 38,000 for testing. ORIGIN The Yelp reviews dataset consists of\nreviews from Yelp. It is extracted from the Yelp Dataset Challenge 2015 data.\nFor more information, please refer to \u003chttp://www.yelp.com/dataset\u003e\n\nThe Yelp reviews polarity dataset is constructed by Xiang Zhang\n(xiang.zhang@nyu.edu) from the above dataset. It is first used as a text\nclassification benchmark in the following paper: Xiang Zhang, Junbo Zhao, Yann\nLeCun. Character-level Convolutional Networks for Text Classification. Advances\nin Neural Information Processing Systems 28 (NIPS 2015).\n\nDESCRIPTION\n\nThe Yelp reviews polarity dataset is constructed by considering stars 1 and 2\nnegative, and 3 and 4 positive. For each polarity 280,000 training samples and\n19,000 testing samples are take randomly. In total there are 560,000 trainig\nsamples and 38,000 testing samples. Negative polarity is class 1, and positive\nclass 2.\n\nThe files train.csv and test.csv contain all the training samples as\ncomma-sparated values. There are 2 columns in them, corresponding to class index\n(1 and 2) and review text. The review texts are escaped using double quotes (\"),\nand any internal double quote is escaped by 2 double quotes (\"\"). New lines are\nescaped by a backslash followed with an \"n\" character, that is \" \".\n\n- **Homepage** :\n \u003chttps://course.fast.ai/datasets\u003e\n\n- **Source code** :\n [`tfds.text.YelpPolarityReviews`](https://github.com/tensorflow/datasets/tree/master/tensorflow_datasets/text/yelp_polarity.py)\n\n- **Versions**:\n\n - **`0.2.0`** (default): No release notes.\n- **Download size** : `158.67 MiB`\n\n- **Dataset size** : `435.14 MiB`\n\n- **Auto-cached**\n ([documentation](https://www.tensorflow.org/datasets/performances#auto-caching)):\n No\n\n- **Splits**:\n\n| Split | Examples |\n|-----------|----------|\n| `'test'` | 38,000 |\n| `'train'` | 560,000 |\n\n- **Feature structure**:\n\n FeaturesDict({\n 'label': ClassLabel(shape=(), dtype=int64, num_classes=2),\n 'text': Text(shape=(), dtype=string),\n })\n\n- **Feature documentation**:\n\n| Feature | Class | Shape | Dtype | Description |\n|---------|--------------|-------|--------|-------------|\n| | FeaturesDict | | | |\n| label | ClassLabel | | int64 | |\n| text | Text | | string | |\n\n- **Supervised keys** (See\n [`as_supervised` doc](https://www.tensorflow.org/datasets/api_docs/python/tfds/load#args)):\n `('text', 'label')`\n\n- **Figure**\n ([tfds.show_examples](https://www.tensorflow.org/datasets/api_docs/python/tfds/visualization/show_examples)):\n Not supported.\n\n- **Examples**\n ([tfds.as_dataframe](https://www.tensorflow.org/datasets/api_docs/python/tfds/as_dataframe)):\n\nDisplay examples... \n\n- **Citation**:\n\n @article{zhangCharacterlevelConvolutionalNetworks2015,\n archivePrefix = {arXiv},\n eprinttype = {arxiv},\n eprint = {1509.01626},\n primaryClass = {cs},\n title = {Character-Level { {Convolutional Networks} } for { {Text Classification} } },\n abstract = {This article offers an empirical exploration on the use of character-level convolutional networks (ConvNets) for text classification. We constructed several large-scale datasets to show that character-level convolutional networks could achieve state-of-the-art or competitive results. Comparisons are offered against traditional models such as bag of words, n-grams and their TFIDF variants, and deep learning models such as word-based ConvNets and recurrent neural networks.},\n journal = {arXiv:1509.01626 [cs]},\n author = {Zhang, Xiang and Zhao, Junbo and LeCun, Yann},\n month = sep,\n year = {2015},\n }"]]