deep1b
Stay organized with collections
Save and categorize content based on your preferences.
Pre-trained embeddings for approximate nearest neighbor search using the cosine
distance. This dataset consists of two splits:
- 'database': consists of 9,990,000 data points, each has features:
'embedding' (96 floats), 'index' (int64), 'neighbors' (empty list).
- 'test': consists of 10,000 data points, each has features: 'embedding' (96
floats), 'index' (int64), 'neighbors' (list of 'index' and 'distance' of the
nearest neighbors in the database.)
Split |
Examples |
'database' |
9,990,000 |
'test' |
10,000 |
FeaturesDict({
'embedding': Tensor(shape=(96,), dtype=float32),
'index': Scalar(shape=(), dtype=int64, description=Index within the split.),
'neighbors': Sequence({
'distance': Scalar(shape=(), dtype=float32, description=Neighbor distance.),
'index': Scalar(shape=(), dtype=int64, description=Neighbor index.),
}),
})
Feature |
Class |
Shape |
Dtype |
Description |
|
FeaturesDict |
|
|
|
embedding |
Tensor |
(96,) |
float32 |
|
index
|
Scalar
|
|
int64
|
Index within the
split. |
neighbors
|
Sequence
|
|
|
The computed
neighbors, which is
only available for the
test split. |
neighbors/distance |
Scalar |
|
float32 |
Neighbor distance. |
neighbors/index |
Scalar |
|
int64 |
Neighbor index. |
@inproceedings{babenko2016efficient,
title={Efficient indexing of billion-scale datasets of deep descriptors},
author={Babenko, Artem and Lempitsky, Victor},
booktitle={Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition},
pages={2055--2063},
year={2016}
}
Except as otherwise noted, the content of this page is licensed under the Creative Commons Attribution 4.0 License, and code samples are licensed under the Apache 2.0 License. For details, see the Google Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.
Last updated 2024-09-03 UTC.
[[["Easy to understand","easyToUnderstand","thumb-up"],["Solved my problem","solvedMyProblem","thumb-up"],["Other","otherUp","thumb-up"]],[["Missing the information I need","missingTheInformationINeed","thumb-down"],["Too complicated / too many steps","tooComplicatedTooManySteps","thumb-down"],["Out of date","outOfDate","thumb-down"],["Samples / code issue","samplesCodeIssue","thumb-down"],["Other","otherDown","thumb-down"]],["Last updated 2024-09-03 UTC."],[],[],null,["# deep1b\n\n\u003cbr /\u003e\n\n- **Description**:\n\nPre-trained embeddings for approximate nearest neighbor search using the cosine\ndistance. This dataset consists of two splits:\n\n1. 'database': consists of 9,990,000 data points, each has features: 'embedding' (96 floats), 'index' (int64), 'neighbors' (empty list).\n2. 'test': consists of 10,000 data points, each has features: 'embedding' (96 floats), 'index' (int64), 'neighbors' (list of 'index' and 'distance' of the nearest neighbors in the database.)\n\n- **Homepage** :\n \u003chttp://sites.skoltech.ru/compvision/noimi/\u003e\n\n- **Source code** :\n [`tfds.nearest_neighbors.deep1b.Deep1b`](https://github.com/tensorflow/datasets/tree/master/tensorflow_datasets/nearest_neighbors/deep1b/deep1b.py)\n\n- **Versions**:\n\n - **`1.0.0`** (default): Initial release.\n- **Download size** : `3.58 GiB`\n\n- **Dataset size** : `4.46 GiB`\n\n- **Auto-cached**\n ([documentation](https://www.tensorflow.org/datasets/performances#auto-caching)):\n No\n\n- **Splits**:\n\n| Split | Examples |\n|--------------|-----------|\n| `'database'` | 9,990,000 |\n| `'test'` | 10,000 |\n\n- **Feature structure**:\n\n FeaturesDict({\n 'embedding': Tensor(shape=(96,), dtype=float32),\n 'index': Scalar(shape=(), dtype=int64, description=Index within the split.),\n 'neighbors': Sequence({\n 'distance': Scalar(shape=(), dtype=float32, description=Neighbor distance.),\n 'index': Scalar(shape=(), dtype=int64, description=Neighbor index.),\n }),\n })\n\n- **Feature documentation**:\n\n| Feature | Class | Shape | Dtype | Description |\n|--------------------|--------------|-------|---------|---------------------------------------------------------------------|\n| | FeaturesDict | | | |\n| embedding | Tensor | (96,) | float32 | |\n| index | Scalar | | int64 | Index within the split. |\n| neighbors | Sequence | | | The computed neighbors, which is only available for the test split. |\n| neighbors/distance | Scalar | | float32 | Neighbor distance. |\n| neighbors/index | Scalar | | int64 | Neighbor index. |\n\n- **Supervised keys** (See\n [`as_supervised` doc](https://www.tensorflow.org/datasets/api_docs/python/tfds/load#args)):\n `None`\n\n- **Figure**\n ([tfds.show_examples](https://www.tensorflow.org/datasets/api_docs/python/tfds/visualization/show_examples)):\n Not supported.\n\n- **Examples**\n ([tfds.as_dataframe](https://www.tensorflow.org/datasets/api_docs/python/tfds/as_dataframe)):\n\nDisplay examples... \n\n- **Citation**:\n\n @inproceedings{babenko2016efficient,\n title={Efficient indexing of billion-scale datasets of deep descriptors},\n author={Babenko, Artem and Lempitsky, Victor},\n booktitle={Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition},\n pages={2055--2063},\n year={2016}\n }"]]