- Description:
 
Criteo Uplift Modeling Dataset
This dataset is released along with the paper: “A Large Scale Benchmark for Uplift Modeling” Eustache Diemert, Artem Betlei, Christophe Renaudin; (Criteo AI Lab), Massih-Reza Amini (LIG, Grenoble INP)
This work was published in: AdKDD 2018 Workshop, in conjunction with KDD 2018.
Data description
This dataset is constructed by assembling data resulting from several incrementality tests, a particular randomized trial procedure where a random part of the population is prevented from being targeted by advertising. it consists of 25M rows, each one representing a user with 11 features, a treatment indicator and 2 labels (visits and conversions).
Fields
Here is a detailed description of the fields (they are comma-separated in the file):
- f0, f1, f2, f3, f4, f5, f6, f7, f8, f9, f10, f11: feature values (dense, float)
 - treatment: treatment group (1 = treated, 0 = control)
 - conversion: whether a conversion occured for this user (binary, label)
 - visit: whether a visit occured for this user (binary, label)
 - exposure: treatment effect, whether the user has been effectively exposed (binary)
 
Key figures
- Format: CSV
 - Size: 459MB (compressed)
 - Rows: 25,309,483
 - Average Visit Rate: .04132
 - Average Conversion Rate: .00229
 - Treatment Ratio: .846
 
Tasks
The dataset was collected and prepared with uplift prediction in mind as the main task. Additionally we can foresee related usages such as but not limited to:
- benchmark for causal inference
 - uplift modeling
 - interactions between features and treatment
 - heterogeneity of treatment
 benchmark for observational causality methods
Additional Documentation: Explore on Papers With Code
Homepage: https://ailab.criteo.com/criteo-uplift-prediction-dataset/
Source code:
tfds.recommendation.criteo.CriteoVersions:
1.0.0: Initial release.1.0.1(default): Fixed parsing of fieldsconversion,visitandexposure.
Download size:
297.00 MiBDataset size:
3.55 GiBAuto-cached (documentation): No
Splits:
| Split | Examples | 
|---|---|
'train' | 
13,979,592 | 
- Feature structure:
 
FeaturesDict({
    'conversion': bool,
    'exposure': bool,
    'f0': float32,
    'f1': float32,
    'f10': float32,
    'f11': float32,
    'f2': float32,
    'f3': float32,
    'f4': float32,
    'f5': float32,
    'f6': float32,
    'f7': float32,
    'f8': float32,
    'f9': float32,
    'treatment': int64,
    'visit': bool,
})
- Feature documentation:
 
| Feature | Class | Shape | Dtype | Description | 
|---|---|---|---|---|
| FeaturesDict | ||||
| conversion | Tensor | bool | ||
| exposure | Tensor | bool | ||
| f0 | Tensor | float32 | ||
| f1 | Tensor | float32 | ||
| f10 | Tensor | float32 | ||
| f11 | Tensor | float32 | ||
| f2 | Tensor | float32 | ||
| f3 | Tensor | float32 | ||
| f4 | Tensor | float32 | ||
| f5 | Tensor | float32 | ||
| f6 | Tensor | float32 | ||
| f7 | Tensor | float32 | ||
| f8 | Tensor | float32 | ||
| f9 | Tensor | float32 | ||
| treatment | Tensor | int64 | ||
| visit | Tensor | bool | 
Supervised keys (See
as_superviseddoc):({'exposure': 'exposure', 'f0': 'f0', 'f1': 'f1', 'f10': 'f10', 'f11': 'f11', 'f2': 'f2', 'f3': 'f3', 'f4': 'f4', 'f5': 'f5', 'f6': 'f6', 'f7': 'f7', 'f8': 'f8', 'f9': 'f9', 'treatment': 'treatment'}, 'visit')Figure (tfds.show_examples): Not supported.
Examples (tfds.as_dataframe):
- Citation:
 
@inproceedings{Diemert2018,
author = { {Diemert Eustache, Betlei Artem} and Renaudin, Christophe and Massih-Reza, Amini},
title={A Large Scale Benchmark for Uplift Modeling},
publisher = {ACM},
booktitle = {Proceedings of the AdKDD and TargetAd Workshop, KDD, London,United Kingdom, August, 20, 2018},
year = {2018}
}