mslr_web
Stay organized with collections
Save and categorize content based on your preferences.
MSLR-WEB are two large-scale Learning-to-Rank datasets released by Microsoft
Research. The first dataset (called "30k") contains 30,000 queries and the
second dataset (called "10k") contains 10,000 queries. Each dataset consists of
query-document pairs represented as feature vectors and corresponding relevance
judgment labels.
You can specify whether to use the "10k" or "30k" version of the dataset, and a
corresponding fold, as follows:
ds = tfds.load("mslr_web/30k_fold1")
If only mslr_web
is specified, the mslr_web/10k_fold1
option is selected by
default:
# This is the same as `tfds.load("mslr_web/10k_fold1")`
ds = tfds.load("mslr_web")
FeaturesDict({
'doc_id': Tensor(shape=(None,), dtype=int64),
'float_features': Tensor(shape=(None, 136), dtype=float64),
'label': Tensor(shape=(None,), dtype=float64),
'query_id': Text(shape=(), dtype=string),
})
Feature |
Class |
Shape |
Dtype |
Description |
|
FeaturesDict |
|
|
|
doc_id |
Tensor |
(None,) |
int64 |
|
float_features |
Tensor |
(None, 136) |
float64 |
|
label |
Tensor |
(None,) |
float64 |
|
query_id |
Text |
|
string |
|
@article{DBLP:journals/corr/QinL13,
author = {Tao Qin and Tie{-}Yan Liu},
title = {Introducing {LETOR} 4.0 Datasets},
journal = {CoRR},
volume = {abs/1306.2597},
year = {2013},
url = {http://arxiv.org/abs/1306.2597},
timestamp = {Mon, 01 Jul 2013 20:31:25 +0200},
biburl = {http://dblp.uni-trier.de/rec/bib/journals/corr/QinL13},
bibsource = {dblp computer science bibliography, http://dblp.org}
}
mslr_web/10k_fold1 (default config)
Download size: 1.15 GiB
Dataset size: 310.08 MiB
Splits:
Split |
Examples |
'test' |
2,000 |
'train' |
6,000 |
'vali' |
2,000 |
mslr_web/10k_fold2
Download size: 1.15 GiB
Dataset size: 310.08 MiB
Splits:
Split |
Examples |
'test' |
2,000 |
'train' |
6,000 |
'vali' |
2,000 |
mslr_web/10k_fold3
Download size: 1.15 GiB
Dataset size: 310.08 MiB
Splits:
Split |
Examples |
'test' |
2,000 |
'train' |
6,000 |
'vali' |
2,000 |
mslr_web/10k_fold4
Download size: 1.15 GiB
Dataset size: 310.08 MiB
Splits:
Split |
Examples |
'test' |
2,000 |
'train' |
6,000 |
'vali' |
2,000 |
mslr_web/10k_fold5
Download size: 1.15 GiB
Dataset size: 310.08 MiB
Splits:
Split |
Examples |
'test' |
2,000 |
'train' |
6,000 |
'vali' |
2,000 |
mslr_web/30k_fold1
Download size: 3.59 GiB
Dataset size: 964.09 MiB
Splits:
Split |
Examples |
'test' |
6,306 |
'train' |
18,919 |
'vali' |
6,306 |
mslr_web/30k_fold2
Download size: 3.59 GiB
Dataset size: 964.09 MiB
Splits:
Split |
Examples |
'test' |
6,307 |
'train' |
18,918 |
'vali' |
6,306 |
mslr_web/30k_fold3
Download size: 3.59 GiB
Dataset size: 964.09 MiB
Splits:
Split |
Examples |
'test' |
6,306 |
'train' |
18,918 |
'vali' |
6,307 |
mslr_web/30k_fold4
Download size: 3.59 GiB
Dataset size: 964.09 MiB
Splits:
Split |
Examples |
'test' |
6,306 |
'train' |
18,919 |
'vali' |
6,306 |
mslr_web/30k_fold5
Download size: 3.59 GiB
Dataset size: 964.09 MiB
Splits:
Split |
Examples |
'test' |
6,306 |
'train' |
18,919 |
'vali' |
6,306 |
Except as otherwise noted, the content of this page is licensed under the Creative Commons Attribution 4.0 License, and code samples are licensed under the Apache 2.0 License. For details, see the Google Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.
Last updated 2023-01-19 UTC.
[[["Easy to understand","easyToUnderstand","thumb-up"],["Solved my problem","solvedMyProblem","thumb-up"],["Other","otherUp","thumb-up"]],[["Missing the information I need","missingTheInformationINeed","thumb-down"],["Too complicated / too many steps","tooComplicatedTooManySteps","thumb-down"],["Out of date","outOfDate","thumb-down"],["Samples / code issue","samplesCodeIssue","thumb-down"],["Other","otherDown","thumb-down"]],["Last updated 2023-01-19 UTC."],[],[],null,["# mslr_web\n\n\u003cbr /\u003e\n\n- **Description**:\n\nMSLR-WEB are two large-scale Learning-to-Rank datasets released by Microsoft\nResearch. The first dataset (called \"30k\") contains 30,000 queries and the\nsecond dataset (called \"10k\") contains 10,000 queries. Each dataset consists of\nquery-document pairs represented as feature vectors and corresponding relevance\njudgment labels.\n\nYou can specify whether to use the \"10k\" or \"30k\" version of the dataset, and a\ncorresponding fold, as follows: \n\n ds = tfds.load(\"mslr_web/30k_fold1\")\n\nIf only `mslr_web` is specified, the `mslr_web/10k_fold1` option is selected by\ndefault: \n\n # This is the same as `tfds.load(\"mslr_web/10k_fold1\")`\n ds = tfds.load(\"mslr_web\")\n\n- **Homepage** :\n \u003chttps://www.microsoft.com/en-us/research/project/mslr/\u003e\n\n- **Source code** :\n [`tfds.ranking.mslr_web.MslrWeb`](https://github.com/tensorflow/datasets/tree/master/tensorflow_datasets/ranking/mslr_web/mslr_web.py)\n\n- **Versions**:\n\n - `1.0.0`: Initial release.\n - `1.1.0`: Bundle features into a single 'float_features' feature.\n - **`1.2.0`** (default): Add query and document identifiers.\n- **Auto-cached**\n ([documentation](https://www.tensorflow.org/datasets/performances#auto-caching)):\n No\n\n- **Feature structure**:\n\n FeaturesDict({\n 'doc_id': Tensor(shape=(None,), dtype=int64),\n 'float_features': Tensor(shape=(None, 136), dtype=float64),\n 'label': Tensor(shape=(None,), dtype=float64),\n 'query_id': Text(shape=(), dtype=string),\n })\n\n- **Feature documentation**:\n\n| Feature | Class | Shape | Dtype | Description |\n|----------------|--------------|-------------|---------|-------------|\n| | FeaturesDict | | | |\n| doc_id | Tensor | (None,) | int64 | |\n| float_features | Tensor | (None, 136) | float64 | |\n| label | Tensor | (None,) | float64 | |\n| query_id | Text | | string | |\n\n- **Supervised keys** (See\n [`as_supervised` doc](https://www.tensorflow.org/datasets/api_docs/python/tfds/load#args)):\n `None`\n\n- **Figure**\n ([tfds.show_examples](https://www.tensorflow.org/datasets/api_docs/python/tfds/visualization/show_examples)):\n Not supported.\n\n- **Citation**:\n\n @article{DBLP:journals/corr/QinL13,\n author = {Tao Qin and Tie{-}Yan Liu},\n title = {Introducing {LETOR} 4.0 Datasets},\n journal = {CoRR},\n volume = {abs/1306.2597},\n year = {2013},\n url = {http://arxiv.org/abs/1306.2597},\n timestamp = {Mon, 01 Jul 2013 20:31:25 +0200},\n biburl = {http://dblp.uni-trier.de/rec/bib/journals/corr/QinL13},\n bibsource = {dblp computer science bibliography, http://dblp.org}\n }\n\nmslr_web/10k_fold1 (default config)\n-----------------------------------\n\n- **Download size** : `1.15 GiB`\n\n- **Dataset size** : `310.08 MiB`\n\n- **Splits**:\n\n| Split | Examples |\n|-----------|----------|\n| `'test'` | 2,000 |\n| `'train'` | 6,000 |\n| `'vali'` | 2,000 |\n\n- **Examples** ([tfds.as_dataframe](https://www.tensorflow.org/datasets/api_docs/python/tfds/as_dataframe)):\n\nDisplay examples... \n\nmslr_web/10k_fold2\n------------------\n\n- **Download size** : `1.15 GiB`\n\n- **Dataset size** : `310.08 MiB`\n\n- **Splits**:\n\n| Split | Examples |\n|-----------|----------|\n| `'test'` | 2,000 |\n| `'train'` | 6,000 |\n| `'vali'` | 2,000 |\n\n- **Examples** ([tfds.as_dataframe](https://www.tensorflow.org/datasets/api_docs/python/tfds/as_dataframe)):\n\nDisplay examples... \n\nmslr_web/10k_fold3\n------------------\n\n- **Download size** : `1.15 GiB`\n\n- **Dataset size** : `310.08 MiB`\n\n- **Splits**:\n\n| Split | Examples |\n|-----------|----------|\n| `'test'` | 2,000 |\n| `'train'` | 6,000 |\n| `'vali'` | 2,000 |\n\n- **Examples** ([tfds.as_dataframe](https://www.tensorflow.org/datasets/api_docs/python/tfds/as_dataframe)):\n\nDisplay examples... \n\nmslr_web/10k_fold4\n------------------\n\n- **Download size** : `1.15 GiB`\n\n- **Dataset size** : `310.08 MiB`\n\n- **Splits**:\n\n| Split | Examples |\n|-----------|----------|\n| `'test'` | 2,000 |\n| `'train'` | 6,000 |\n| `'vali'` | 2,000 |\n\n- **Examples** ([tfds.as_dataframe](https://www.tensorflow.org/datasets/api_docs/python/tfds/as_dataframe)):\n\nDisplay examples... \n\nmslr_web/10k_fold5\n------------------\n\n- **Download size** : `1.15 GiB`\n\n- **Dataset size** : `310.08 MiB`\n\n- **Splits**:\n\n| Split | Examples |\n|-----------|----------|\n| `'test'` | 2,000 |\n| `'train'` | 6,000 |\n| `'vali'` | 2,000 |\n\n- **Examples** ([tfds.as_dataframe](https://www.tensorflow.org/datasets/api_docs/python/tfds/as_dataframe)):\n\nDisplay examples... \n\nmslr_web/30k_fold1\n------------------\n\n- **Download size** : `3.59 GiB`\n\n- **Dataset size** : `964.09 MiB`\n\n- **Splits**:\n\n| Split | Examples |\n|-----------|----------|\n| `'test'` | 6,306 |\n| `'train'` | 18,919 |\n| `'vali'` | 6,306 |\n\n- **Examples** ([tfds.as_dataframe](https://www.tensorflow.org/datasets/api_docs/python/tfds/as_dataframe)):\n\nDisplay examples... \n\nmslr_web/30k_fold2\n------------------\n\n- **Download size** : `3.59 GiB`\n\n- **Dataset size** : `964.09 MiB`\n\n- **Splits**:\n\n| Split | Examples |\n|-----------|----------|\n| `'test'` | 6,307 |\n| `'train'` | 18,918 |\n| `'vali'` | 6,306 |\n\n- **Examples** ([tfds.as_dataframe](https://www.tensorflow.org/datasets/api_docs/python/tfds/as_dataframe)):\n\nDisplay examples... \n\nmslr_web/30k_fold3\n------------------\n\n- **Download size** : `3.59 GiB`\n\n- **Dataset size** : `964.09 MiB`\n\n- **Splits**:\n\n| Split | Examples |\n|-----------|----------|\n| `'test'` | 6,306 |\n| `'train'` | 18,918 |\n| `'vali'` | 6,307 |\n\n- **Examples** ([tfds.as_dataframe](https://www.tensorflow.org/datasets/api_docs/python/tfds/as_dataframe)):\n\nDisplay examples... \n\nmslr_web/30k_fold4\n------------------\n\n- **Download size** : `3.59 GiB`\n\n- **Dataset size** : `964.09 MiB`\n\n- **Splits**:\n\n| Split | Examples |\n|-----------|----------|\n| `'test'` | 6,306 |\n| `'train'` | 18,919 |\n| `'vali'` | 6,306 |\n\n- **Examples** ([tfds.as_dataframe](https://www.tensorflow.org/datasets/api_docs/python/tfds/as_dataframe)):\n\nDisplay examples... \n\nmslr_web/30k_fold5\n------------------\n\n- **Download size** : `3.59 GiB`\n\n- **Dataset size** : `964.09 MiB`\n\n- **Splits**:\n\n| Split | Examples |\n|-----------|----------|\n| `'test'` | 6,306 |\n| `'train'` | 18,919 |\n| `'vali'` | 6,306 |\n\n- **Examples** ([tfds.as_dataframe](https://www.tensorflow.org/datasets/api_docs/python/tfds/as_dataframe)):\n\nDisplay examples..."]]