glove100_angular
Stay organized with collections
Save and categorize content based on your preferences.
Pre-trained Global Vectors for Word Representation (GloVe) embeddings for
approximate nearest neighbor search. This dataset consists of two splits:
- 'database': consists of 1,183,514 data points, each has features:
'embedding' (100 floats), 'index' (int64), 'neighbors' (empty list).
- 'test': consists of 10,000 data points, each has features: 'embedding' (100
floats), 'index' (int64), 'neighbors' (list of 'index' and 'distance' of the
nearest neighbors in the database.)
Split |
Examples |
'database' |
1,183,514 |
'test' |
10,000 |
FeaturesDict({
'embedding': Tensor(shape=(100,), dtype=float32),
'index': Scalar(shape=(), dtype=int64, description=Index within the split.),
'neighbors': Sequence({
'distance': Scalar(shape=(), dtype=float32, description=Neighbor distance.),
'index': Scalar(shape=(), dtype=int64, description=Neighbor index.),
}),
})
Feature |
Class |
Shape |
Dtype |
Description |
|
FeaturesDict |
|
|
|
embedding |
Tensor |
(100,) |
float32 |
|
index
|
Scalar
|
|
int64
|
Index within the
split. |
neighbors
|
Sequence
|
|
|
The computed
neighbors, which is
only available for
the test split. |
neighbors/distance |
Scalar |
|
float32 |
Neighbor distance. |
neighbors/index |
Scalar |
|
int64 |
Neighbor index. |
@inproceedings{pennington2014glove,
author = {Jeffrey Pennington and Richard Socher and Christopher D. Manning},
booktitle = {Empirical Methods in Natural Language Processing (EMNLP)},
title = {GloVe: Global Vectors for Word Representation},
year = {2014},
pages = {1532--1543},
url = {http://www.aclweb.org/anthology/D14-1162},
}
Except as otherwise noted, the content of this page is licensed under the Creative Commons Attribution 4.0 License, and code samples are licensed under the Apache 2.0 License. For details, see the Google Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.
Last updated 2024-09-03 UTC.
[[["Easy to understand","easyToUnderstand","thumb-up"],["Solved my problem","solvedMyProblem","thumb-up"],["Other","otherUp","thumb-up"]],[["Missing the information I need","missingTheInformationINeed","thumb-down"],["Too complicated / too many steps","tooComplicatedTooManySteps","thumb-down"],["Out of date","outOfDate","thumb-down"],["Samples / code issue","samplesCodeIssue","thumb-down"],["Other","otherDown","thumb-down"]],["Last updated 2024-09-03 UTC."],[],[],null,["# glove100_angular\n\n\u003cbr /\u003e\n\n- **Description**:\n\nPre-trained Global Vectors for Word Representation (GloVe) embeddings for\napproximate nearest neighbor search. This dataset consists of two splits:\n\n1. 'database': consists of 1,183,514 data points, each has features: 'embedding' (100 floats), 'index' (int64), 'neighbors' (empty list).\n2. 'test': consists of 10,000 data points, each has features: 'embedding' (100 floats), 'index' (int64), 'neighbors' (list of 'index' and 'distance' of the nearest neighbors in the database.)\n\n- **Homepage** :\n \u003chttps://nlp.stanford.edu/projects/glove/\u003e\n\n- **Source code** :\n [`tfds.nearest_neighbors.glove_100_angular.Glove100Angular`](https://github.com/tensorflow/datasets/tree/master/tensorflow_datasets/nearest_neighbors/glove_100_angular/glove_100_angular.py)\n\n- **Versions**:\n\n - **`1.0.0`** (default): Initial release.\n- **Download size** : `462.93 MiB`\n\n- **Dataset size** : `567.90 MiB`\n\n- **Auto-cached**\n ([documentation](https://www.tensorflow.org/datasets/performances#auto-caching)):\n No\n\n- **Splits**:\n\n| Split | Examples |\n|--------------|-----------|\n| `'database'` | 1,183,514 |\n| `'test'` | 10,000 |\n\n- **Feature structure**:\n\n FeaturesDict({\n 'embedding': Tensor(shape=(100,), dtype=float32),\n 'index': Scalar(shape=(), dtype=int64, description=Index within the split.),\n 'neighbors': Sequence({\n 'distance': Scalar(shape=(), dtype=float32, description=Neighbor distance.),\n 'index': Scalar(shape=(), dtype=int64, description=Neighbor index.),\n }),\n })\n\n- **Feature documentation**:\n\n| Feature | Class | Shape | Dtype | Description |\n|--------------------|--------------|--------|---------|---------------------------------------------------------------------|\n| | FeaturesDict | | | |\n| embedding | Tensor | (100,) | float32 | |\n| index | Scalar | | int64 | Index within the split. |\n| neighbors | Sequence | | | The computed neighbors, which is only available for the test split. |\n| neighbors/distance | Scalar | | float32 | Neighbor distance. |\n| neighbors/index | Scalar | | int64 | Neighbor index. |\n\n- **Supervised keys** (See\n [`as_supervised` doc](https://www.tensorflow.org/datasets/api_docs/python/tfds/load#args)):\n `None`\n\n- **Figure**\n ([tfds.show_examples](https://www.tensorflow.org/datasets/api_docs/python/tfds/visualization/show_examples)):\n Not supported.\n\n- **Examples**\n ([tfds.as_dataframe](https://www.tensorflow.org/datasets/api_docs/python/tfds/as_dataframe)):\n\nDisplay examples... \n\n- **Citation**:\n\n @inproceedings{pennington2014glove,\n author = {Jeffrey Pennington and Richard Socher and Christopher D. Manning},\n booktitle = {Empirical Methods in Natural Language Processing (EMNLP)},\n title = {GloVe: Global Vectors for Word Representation},\n year = {2014},\n pages = {1532--1543},\n url = {http://www.aclweb.org/anthology/D14-1162},\n }"]]