criteo
Stay organized with collections
Save and categorize content based on your preferences.
Criteo Uplift Modeling Dataset
This dataset is released along with the paper: “A Large Scale Benchmark for
Uplift Modeling” Eustache Diemert, Artem Betlei, Christophe Renaudin; (Criteo AI
Lab), Massih-Reza Amini (LIG, Grenoble INP)
This work was published in: AdKDD 2018 Workshop, in conjunction with KDD 2018.
Data description
This dataset is constructed by assembling data resulting from several
incrementality tests, a particular randomized trial procedure where a random
part of the population is prevented from being targeted by advertising. it
consists of 25M rows, each one representing a user with 11 features, a treatment
indicator and 2 labels (visits and conversions).
Fields
Here is a detailed description of the fields (they are comma-separated in the
file):
- f0, f1, f2, f3, f4, f5, f6, f7, f8, f9, f10, f11: feature values (dense,
float)
- treatment: treatment group (1 = treated, 0 = control)
- conversion: whether a conversion occured for this user (binary, label)
- visit: whether a visit occured for this user (binary, label)
- exposure: treatment effect, whether the user has been effectively exposed
(binary)
- Format: CSV
- Size: 459MB (compressed)
- Rows: 25,309,483
- Average Visit Rate: .04132
- Average Conversion Rate: .00229
- Treatment Ratio: .846
Tasks
The dataset was collected and prepared with uplift prediction in mind as the
main task. Additionally we can foresee related usages such as but not limited
to:
Split |
Examples |
'train' |
13,979,592 |
FeaturesDict({
'conversion': bool,
'exposure': bool,
'f0': float32,
'f1': float32,
'f10': float32,
'f11': float32,
'f2': float32,
'f3': float32,
'f4': float32,
'f5': float32,
'f6': float32,
'f7': float32,
'f8': float32,
'f9': float32,
'treatment': int64,
'visit': bool,
})
Feature |
Class |
Shape |
Dtype |
Description |
|
FeaturesDict |
|
|
|
conversion |
Tensor |
|
bool |
|
exposure |
Tensor |
|
bool |
|
f0 |
Tensor |
|
float32 |
|
f1 |
Tensor |
|
float32 |
|
f10 |
Tensor |
|
float32 |
|
f11 |
Tensor |
|
float32 |
|
f2 |
Tensor |
|
float32 |
|
f3 |
Tensor |
|
float32 |
|
f4 |
Tensor |
|
float32 |
|
f5 |
Tensor |
|
float32 |
|
f6 |
Tensor |
|
float32 |
|
f7 |
Tensor |
|
float32 |
|
f8 |
Tensor |
|
float32 |
|
f9 |
Tensor |
|
float32 |
|
treatment |
Tensor |
|
int64 |
|
visit |
Tensor |
|
bool |
|
Supervised keys (See
as_supervised
doc):
({'exposure': 'exposure', 'f0': 'f0', 'f1': 'f1', 'f10': 'f10', 'f11':
'f11', 'f2': 'f2', 'f3': 'f3', 'f4': 'f4', 'f5': 'f5', 'f6': 'f6', 'f7':
'f7', 'f8': 'f8', 'f9': 'f9', 'treatment': 'treatment'}, 'visit')
Figure
(tfds.show_examples):
Not supported.
Examples
(tfds.as_dataframe):
@inproceedings{Diemert2018,
author = { {Diemert Eustache, Betlei Artem} and Renaudin, Christophe and Massih-Reza, Amini},
title={A Large Scale Benchmark for Uplift Modeling},
publisher = {ACM},
booktitle = {Proceedings of the AdKDD and TargetAd Workshop, KDD, London,United Kingdom, August, 20, 2018},
year = {2018}
}
Except as otherwise noted, the content of this page is licensed under the Creative Commons Attribution 4.0 License, and code samples are licensed under the Apache 2.0 License. For details, see the Google Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.
Last updated 2022-12-22 UTC.
[[["Easy to understand","easyToUnderstand","thumb-up"],["Solved my problem","solvedMyProblem","thumb-up"],["Other","otherUp","thumb-up"]],[["Missing the information I need","missingTheInformationINeed","thumb-down"],["Too complicated / too many steps","tooComplicatedTooManySteps","thumb-down"],["Out of date","outOfDate","thumb-down"],["Samples / code issue","samplesCodeIssue","thumb-down"],["Other","otherDown","thumb-down"]],["Last updated 2022-12-22 UTC."],[],[],null,["# criteo\n\n\u003cbr /\u003e\n\n- **Description**:\n\nCriteo Uplift Modeling Dataset\n==============================\n\nThis dataset is released along with the paper: \"A Large Scale Benchmark for\nUplift Modeling\" Eustache Diemert, Artem Betlei, Christophe Renaudin; (Criteo AI\nLab), Massih-Reza Amini (LIG, Grenoble INP)\n\nThis work was published in: AdKDD 2018 Workshop, in conjunction with KDD 2018.\n\n### Data description\n\nThis dataset is constructed by assembling data resulting from several\nincrementality tests, a particular randomized trial procedure where a random\npart of the population is prevented from being targeted by advertising. it\nconsists of 25M rows, each one representing a user with 11 features, a treatment\nindicator and 2 labels (visits and conversions).\n\n### Fields\n\nHere is a detailed description of the fields (they are comma-separated in the\nfile):\n\n- f0, f1, f2, f3, f4, f5, f6, f7, f8, f9, f10, f11: feature values (dense, float)\n- treatment: treatment group (1 = treated, 0 = control)\n- conversion: whether a conversion occured for this user (binary, label)\n- visit: whether a visit occured for this user (binary, label)\n- exposure: treatment effect, whether the user has been effectively exposed (binary)\n\n### Key figures\n\n- Format: CSV\n- Size: 459MB (compressed)\n- Rows: 25,309,483\n- Average Visit Rate: .04132\n- Average Conversion Rate: .00229\n- Treatment Ratio: .846\n\n### Tasks\n\nThe dataset was collected and prepared with uplift prediction in mind as the\nmain task. Additionally we can foresee related usages such as but not limited\nto:\n\n- benchmark for causal inference\n- uplift modeling\n- interactions between features and treatment\n- heterogeneity of treatment\n- benchmark for observational causality methods\n\n- **Additional Documentation** :\n [Explore on Papers With Code\n north_east](https://paperswithcode.com/dataset/criteo)\n\n- **Homepage** :\n \u003chttps://ailab.criteo.com/criteo-uplift-prediction-dataset/\u003e\n\n- **Source code** :\n [`tfds.recommendation.criteo.Criteo`](https://github.com/tensorflow/datasets/tree/master/tensorflow_datasets/recommendation/criteo/criteo.py)\n\n- **Versions**:\n\n - `1.0.0`: Initial release.\n - **`1.0.1`** (default): Fixed parsing of fields `conversion`, `visit` and `exposure`.\n- **Download size** : `297.00 MiB`\n\n- **Dataset size** : `3.55 GiB`\n\n- **Auto-cached**\n ([documentation](https://www.tensorflow.org/datasets/performances#auto-caching)):\n No\n\n- **Splits**:\n\n| Split | Examples |\n|-----------|------------|\n| `'train'` | 13,979,592 |\n\n- **Feature structure**:\n\n FeaturesDict({\n 'conversion': bool,\n 'exposure': bool,\n 'f0': float32,\n 'f1': float32,\n 'f10': float32,\n 'f11': float32,\n 'f2': float32,\n 'f3': float32,\n 'f4': float32,\n 'f5': float32,\n 'f6': float32,\n 'f7': float32,\n 'f8': float32,\n 'f9': float32,\n 'treatment': int64,\n 'visit': bool,\n })\n\n- **Feature documentation**:\n\n| Feature | Class | Shape | Dtype | Description |\n|------------|--------------|-------|---------|-------------|\n| | FeaturesDict | | | |\n| conversion | Tensor | | bool | |\n| exposure | Tensor | | bool | |\n| f0 | Tensor | | float32 | |\n| f1 | Tensor | | float32 | |\n| f10 | Tensor | | float32 | |\n| f11 | Tensor | | float32 | |\n| f2 | Tensor | | float32 | |\n| f3 | Tensor | | float32 | |\n| f4 | Tensor | | float32 | |\n| f5 | Tensor | | float32 | |\n| f6 | Tensor | | float32 | |\n| f7 | Tensor | | float32 | |\n| f8 | Tensor | | float32 | |\n| f9 | Tensor | | float32 | |\n| treatment | Tensor | | int64 | |\n| visit | Tensor | | bool | |\n\n- **Supervised keys** (See\n [`as_supervised` doc](https://www.tensorflow.org/datasets/api_docs/python/tfds/load#args)):\n `({'exposure': 'exposure', 'f0': 'f0', 'f1': 'f1', 'f10': 'f10', 'f11':\n 'f11', 'f2': 'f2', 'f3': 'f3', 'f4': 'f4', 'f5': 'f5', 'f6': 'f6', 'f7':\n 'f7', 'f8': 'f8', 'f9': 'f9', 'treatment': 'treatment'}, 'visit')`\n\n- **Figure**\n ([tfds.show_examples](https://www.tensorflow.org/datasets/api_docs/python/tfds/visualization/show_examples)):\n Not supported.\n\n- **Examples**\n ([tfds.as_dataframe](https://www.tensorflow.org/datasets/api_docs/python/tfds/as_dataframe)):\n\nDisplay examples... \n\n- **Citation**:\n\n @inproceedings{Diemert2018,\n author = { {Diemert Eustache, Betlei Artem} and Renaudin, Christophe and Massih-Reza, Amini},\n title={A Large Scale Benchmark for Uplift Modeling},\n publisher = {ACM},\n booktitle = {Proceedings of the AdKDD and TargetAd Workshop, KDD, London,United Kingdom, August, 20, 2018},\n year = {2018}\n }"]]