sift1m
Stay organized with collections
Save and categorize content based on your preferences.
Pre-trained embeddings for approximate nearest neighbor search using the
Euclidean distance. This dataset consists of two splits:
- 'database': consists of 1,000,000 data points, each has features:
'embedding' (128 floats), 'index' (int64), 'neighbors' (empty list).
- 'test': consists of 10,000 data points, each has features: 'embedding' (128
floats), 'index' (int64), 'neighbors' (list of 'index' and 'distance' of the
nearest neighbors in the database.)
Split |
Examples |
'database' |
1,000,000 |
'test' |
10,000 |
FeaturesDict({
'embedding': Tensor(shape=(128,), dtype=float32),
'index': Scalar(shape=(), dtype=int64, description=Index within the split.),
'neighbors': Sequence({
'distance': Scalar(shape=(), dtype=float32, description=Neighbor distance.),
'index': Scalar(shape=(), dtype=int64, description=Neighbor index.),
}),
})
Feature |
Class |
Shape |
Dtype |
Description |
|
FeaturesDict |
|
|
|
embedding |
Tensor |
(128,) |
float32 |
|
index
|
Scalar
|
|
int64
|
Index within the
split. |
neighbors
|
Sequence
|
|
|
The computed
neighbors, which is
only available for
the test split. |
neighbors/distance |
Scalar |
|
float32 |
Neighbor distance. |
neighbors/index |
Scalar |
|
int64 |
Neighbor index. |
@article{jegou2010product,
title={Product quantization for nearest neighbor search},
author={Jegou, Herve and Douze, Matthijs and Schmid, Cordelia},
journal={IEEE transactions on pattern analysis and machine intelligence},
volume={33},
number={1},
pages={117--128},
year={2010},
publisher={IEEE}
}
Except as otherwise noted, the content of this page is licensed under the Creative Commons Attribution 4.0 License, and code samples are licensed under the Apache 2.0 License. For details, see the Google Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.
Last updated 2024-09-03 UTC.
[[["Easy to understand","easyToUnderstand","thumb-up"],["Solved my problem","solvedMyProblem","thumb-up"],["Other","otherUp","thumb-up"]],[["Missing the information I need","missingTheInformationINeed","thumb-down"],["Too complicated / too many steps","tooComplicatedTooManySteps","thumb-down"],["Out of date","outOfDate","thumb-down"],["Samples / code issue","samplesCodeIssue","thumb-down"],["Other","otherDown","thumb-down"]],["Last updated 2024-09-03 UTC."],[],[],null,["# sift1m\n\n\u003cbr /\u003e\n\n- **Description**:\n\nPre-trained embeddings for approximate nearest neighbor search using the\nEuclidean distance. This dataset consists of two splits:\n\n1. 'database': consists of 1,000,000 data points, each has features: 'embedding' (128 floats), 'index' (int64), 'neighbors' (empty list).\n2. 'test': consists of 10,000 data points, each has features: 'embedding' (128 floats), 'index' (int64), 'neighbors' (list of 'index' and 'distance' of the nearest neighbors in the database.)\n\n- **Homepage** :\n \u003chttp://corpus-texmex.irisa.fr/\u003e\n\n- **Source code** :\n [`tfds.datasets.sift1m.Builder`](https://github.com/tensorflow/datasets/tree/master/tensorflow_datasets/datasets/sift1m/sift1m_dataset_builder.py)\n\n- **Versions**:\n\n - **`1.0.0`** (default): Initial release.\n- **Download size** : `500.80 MiB`\n\n- **Dataset size** : `589.49 MiB`\n\n- **Auto-cached**\n ([documentation](https://www.tensorflow.org/datasets/performances#auto-caching)):\n No\n\n- **Splits**:\n\n| Split | Examples |\n|--------------|-----------|\n| `'database'` | 1,000,000 |\n| `'test'` | 10,000 |\n\n- **Feature structure**:\n\n FeaturesDict({\n 'embedding': Tensor(shape=(128,), dtype=float32),\n 'index': Scalar(shape=(), dtype=int64, description=Index within the split.),\n 'neighbors': Sequence({\n 'distance': Scalar(shape=(), dtype=float32, description=Neighbor distance.),\n 'index': Scalar(shape=(), dtype=int64, description=Neighbor index.),\n }),\n })\n\n- **Feature documentation**:\n\n| Feature | Class | Shape | Dtype | Description |\n|--------------------|--------------|--------|---------|---------------------------------------------------------------------|\n| | FeaturesDict | | | |\n| embedding | Tensor | (128,) | float32 | |\n| index | Scalar | | int64 | Index within the split. |\n| neighbors | Sequence | | | The computed neighbors, which is only available for the test split. |\n| neighbors/distance | Scalar | | float32 | Neighbor distance. |\n| neighbors/index | Scalar | | int64 | Neighbor index. |\n\n- **Supervised keys** (See\n [`as_supervised` doc](https://www.tensorflow.org/datasets/api_docs/python/tfds/load#args)):\n `None`\n\n- **Figure**\n ([tfds.show_examples](https://www.tensorflow.org/datasets/api_docs/python/tfds/visualization/show_examples)):\n Not supported.\n\n- **Examples**\n ([tfds.as_dataframe](https://www.tensorflow.org/datasets/api_docs/python/tfds/as_dataframe)):\n\nDisplay examples... \n\n- **Citation**:\n\n @article{jegou2010product,\n title={Product quantization for nearest neighbor search},\n author={Jegou, Herve and Douze, Matthijs and Schmid, Cordelia},\n journal={IEEE transactions on pattern analysis and machine intelligence},\n volume={33},\n number={1},\n pages={117--128},\n year={2010},\n publisher={IEEE}\n }"]]