istella
Stay organized with collections
Save and categorize content based on your preferences.
The Istella datasets are three large-scale Learning-to-Rank datasets released by
Istella. Each dataset consists of query-document pairs represented as feature
vectors and corresponding relevance judgment labels.
The dataset contains three versions:
main
("Istella LETOR"): Containing 10,454,629 query-document pairs.
s
("Istella-S LETOR"): Containing 3,408,630 query-document pairs.
x
("Istella-X LETOR"): Containing 26,791,447 query-document pairs.
You can specify whether to use the main
, s
or x
version of the dataset as
follows:
ds = tfds.load("istella/main")
ds = tfds.load("istella/s")
ds = tfds.load("istella/x")
If only istella
is specified, the istella/main
option is selected by
default:
# This is the same as `tfds.load("istella/main")`
ds = tfds.load("istella")
FeaturesDict({
'doc_id': Tensor(shape=(None,), dtype=int64),
'float_features': Tensor(shape=(None, 220), dtype=float64),
'label': Tensor(shape=(None,), dtype=float64),
'query_id': Text(shape=(), dtype=string),
})
Feature |
Class |
Shape |
Dtype |
Description |
|
FeaturesDict |
|
|
|
doc_id |
Tensor |
(None,) |
int64 |
|
float_features |
Tensor |
(None, 220) |
float64 |
|
label |
Tensor |
(None,) |
float64 |
|
query_id |
Text |
|
string |
|
@article{10.1145/2987380,
author = {Dato, Domenico and Lucchese, Claudio and Nardini, Franco Maria and Orlando, Salvatore and Perego, Raffaele and Tonellotto, Nicola and Venturini, Rossano},
title = {Fast Ranking with Additive Ensembles of Oblivious and Non-Oblivious Regression Trees},
year = {2016},
publisher = {ACM},
address = {New York, NY, USA},
volume = {35},
number = {2},
issn = {1046-8188},
url = {https://doi.org/10.1145/2987380},
doi = {10.1145/2987380},
journal = {ACM Transactions on Information Systems},
articleno = {15},
numpages = {31},
}
istella/main (default config)
Download size: 1.20 GiB
Dataset size: 1.12 GiB
Splits:
Split |
Examples |
'test' |
9,799 |
'train' |
23,219 |
istella/s
Split |
Examples |
'test' |
6,562 |
'train' |
19,245 |
'vali' |
7,211 |
istella/x
Download size: 4.42 GiB
Dataset size: 2.46 GiB
Splits:
Split |
Examples |
'test' |
2,000 |
'train' |
6,000 |
'vali' |
2,000 |
Except as otherwise noted, the content of this page is licensed under the Creative Commons Attribution 4.0 License, and code samples are licensed under the Apache 2.0 License. For details, see the Google Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.
Last updated 2023-01-19 UTC.
[[["Easy to understand","easyToUnderstand","thumb-up"],["Solved my problem","solvedMyProblem","thumb-up"],["Other","otherUp","thumb-up"]],[["Missing the information I need","missingTheInformationINeed","thumb-down"],["Too complicated / too many steps","tooComplicatedTooManySteps","thumb-down"],["Out of date","outOfDate","thumb-down"],["Samples / code issue","samplesCodeIssue","thumb-down"],["Other","otherDown","thumb-down"]],["Last updated 2023-01-19 UTC."],[],[],null,["# istella\n\n\u003cbr /\u003e\n\n- **Description**:\n\nThe Istella datasets are three large-scale Learning-to-Rank datasets released by\nIstella. Each dataset consists of query-document pairs represented as feature\nvectors and corresponding relevance judgment labels.\n\nThe dataset contains three versions:\n\n- `main` (\"Istella LETOR\"): Containing 10,454,629 query-document pairs.\n- `s` (\"Istella-S LETOR\"): Containing 3,408,630 query-document pairs.\n- `x` (\"Istella-X LETOR\"): Containing 26,791,447 query-document pairs.\n\nYou can specify whether to use the `main`, `s` or `x` version of the dataset as\nfollows: \n\n ds = tfds.load(\"istella/main\")\n ds = tfds.load(\"istella/s\")\n ds = tfds.load(\"istella/x\")\n\nIf only `istella` is specified, the `istella/main` option is selected by\ndefault: \n\n # This is the same as `tfds.load(\"istella/main\")`\n ds = tfds.load(\"istella\")\n\n- **Homepage** :\n \u003chttp://quickrank.isti.cnr.it/istella-dataset/\u003e\n\n- **Source code** :\n [`tfds.ranking.istella.Istella`](https://github.com/tensorflow/datasets/tree/master/tensorflow_datasets/ranking/istella/istella.py)\n\n- **Versions**:\n\n - `1.0.0`: Initial release.\n - `1.0.1`: Fix serialization to support float64.\n - `1.1.0`: Bundle features into a single 'float_features' feature.\n - **`1.2.0`** (default): Add query and document identifiers.\n- **Auto-cached**\n ([documentation](https://www.tensorflow.org/datasets/performances#auto-caching)):\n No\n\n- **Feature structure**:\n\n FeaturesDict({\n 'doc_id': Tensor(shape=(None,), dtype=int64),\n 'float_features': Tensor(shape=(None, 220), dtype=float64),\n 'label': Tensor(shape=(None,), dtype=float64),\n 'query_id': Text(shape=(), dtype=string),\n })\n\n- **Feature documentation**:\n\n| Feature | Class | Shape | Dtype | Description |\n|----------------|--------------|-------------|---------|-------------|\n| | FeaturesDict | | | |\n| doc_id | Tensor | (None,) | int64 | |\n| float_features | Tensor | (None, 220) | float64 | |\n| label | Tensor | (None,) | float64 | |\n| query_id | Text | | string | |\n\n- **Supervised keys** (See\n [`as_supervised` doc](https://www.tensorflow.org/datasets/api_docs/python/tfds/load#args)):\n `None`\n\n- **Figure**\n ([tfds.show_examples](https://www.tensorflow.org/datasets/api_docs/python/tfds/visualization/show_examples)):\n Not supported.\n\n- **Citation**:\n\n @article{10.1145/2987380,\n author = {Dato, Domenico and Lucchese, Claudio and Nardini, Franco Maria and Orlando, Salvatore and Perego, Raffaele and Tonellotto, Nicola and Venturini, Rossano},\n title = {Fast Ranking with Additive Ensembles of Oblivious and Non-Oblivious Regression Trees},\n year = {2016},\n publisher = {ACM},\n address = {New York, NY, USA},\n volume = {35},\n number = {2},\n issn = {1046-8188},\n url = {https://doi.org/10.1145/2987380},\n doi = {10.1145/2987380},\n journal = {ACM Transactions on Information Systems},\n articleno = {15},\n numpages = {31},\n }\n\nistella/main (default config)\n-----------------------------\n\n- **Download size** : `1.20 GiB`\n\n- **Dataset size** : `1.12 GiB`\n\n- **Splits**:\n\n| Split | Examples |\n|-----------|----------|\n| `'test'` | 9,799 |\n| `'train'` | 23,219 |\n\n- **Examples** ([tfds.as_dataframe](https://www.tensorflow.org/datasets/api_docs/python/tfds/as_dataframe)):\n\nDisplay examples... \n\nistella/s\n---------\n\n- **Download size** : `450.26 MiB`\n\n- **Dataset size** : `421.88 MiB`\n\n- **Splits**:\n\n| Split | Examples |\n|-----------|----------|\n| `'test'` | 6,562 |\n| `'train'` | 19,245 |\n| `'vali'` | 7,211 |\n\n- **Examples** ([tfds.as_dataframe](https://www.tensorflow.org/datasets/api_docs/python/tfds/as_dataframe)):\n\nDisplay examples... \n\nistella/x\n---------\n\n- **Download size** : `4.42 GiB`\n\n- **Dataset size** : `2.46 GiB`\n\n- **Splits**:\n\n| Split | Examples |\n|-----------|----------|\n| `'test'` | 2,000 |\n| `'train'` | 6,000 |\n| `'vali'` | 2,000 |\n\n- **Examples** ([tfds.as_dataframe](https://www.tensorflow.org/datasets/api_docs/python/tfds/as_dataframe)):\n\nDisplay examples..."]]