The raw CelebA dataset contains 10,177 unique identities. During LEAF
preprocessing, all clients with less than 5 examples are removed; this leaves
9,343 clients.
The data is available with train and test splits by clients or by examples.
That is, when split by clients, ~90% of clients are selected for the train
set, ~10% of clients are selected for test, and all the examples for a given
user are part of the same data split. When split by examples, each client is
located in both the train data and the test data, with ~90% of the examples
on each client selected for train and ~10% of the examples selected for test.
'image': a tf.Tensor with dtype=tf.int64 and shape [84, 84, 3],
containing the red/blue/green pixels of the image. Each pixel is a value
in the range [0, 255].
The OrderedDict objects also contain an additional 40 key/value pairs for the
celebrity image attributes, each of the format:
{attribute name}: a tf.Tensor with dtype=tf.bool and shape [1],
set to True if the celebrity has this attribute in the image, or False
if they don't.
There are 9,343 clients in the federated CelebA dataset
with 5 or more examples. If this argument is True, clients are divided
into train and test groups, with 8,408 and 935 clients respectively. If
this argument is False, the data is divided by examples instead, i.e., all
clients participate in both the train and test groups, with ~90% of the
examples belonging to the train group and the rest belonging to the test
group.
cache_dir
(Optional) directory to cache the downloaded file. If None,
caches in Keras' default cache directory.
[[["Easy to understand","easyToUnderstand","thumb-up"],["Solved my problem","solvedMyProblem","thumb-up"],["Other","otherUp","thumb-up"]],[["Missing the information I need","missingTheInformationINeed","thumb-down"],["Too complicated / too many steps","tooComplicatedTooManySteps","thumb-down"],["Out of date","outOfDate","thumb-down"],["Samples / code issue","samplesCodeIssue","thumb-down"],["Other","otherDown","thumb-down"]],["Last updated 2024-09-20 UTC."],[],[],null,["# tff.simulation.datasets.celeba.load_data\n\n\u003cbr /\u003e\n\n|-------------------------------------------------------------------------------------------------------------------------------------------------------------|\n| [View source on GitHub](https://github.com/tensorflow/federated/blob/v0.87.0 Version 2.0, January 2004 Licensed under the Apache License, Version 2.0 (the) |\n\nLoads the Federated CelebA dataset. \n\n tff.simulation.datasets.celeba.load_data(\n split_by_clients=True, cache_dir=None\n )\n\nDownloads and caches the dataset locally. If previously downloaded, tries to\nload the dataset from cache.\n\nThis dataset is derived from the\n[LEAF repository](https://github.com/TalwalkarLab/leaf) preprocessing of the\n[CelebA dataset](https://mmlab.ie.cuhk.edu.hk/projects/CelebA.html),\ngrouping examples by celebrity id. Details about LEAF were published in\n[\"LEAF: A Benchmark for Federated\nSettings\"](https://arxiv.org/abs/1812.01097), and details about CelebA were\npublished in [\"Deep Learning Face Attributes in the\nWild\"](https://arxiv.org/abs/1411.7766).\n\nThe raw CelebA dataset contains 10,177 unique identities. During LEAF\npreprocessing, all clients with less than 5 examples are removed; this leaves\n9,343 clients.\n\nThe data is available with train and test splits by clients or by examples.\nThat is, when split by clients, \\~90% of clients are selected for the train\nset, \\~10% of clients are selected for test, and all the examples for a given\nuser are part of the same data split. When split by examples, each client is\nlocated in both the train data and the test data, with \\~90% of the examples\non each client selected for train and \\~10% of the examples selected for test.\n\n#### Data set sizes:\n\n*split_by_clients=True*:\n\n- train: 8,408 clients, 180,429 total examples\n- test: 935 clients, 19,859 total examples\n\n*split_by_clients=False*:\n\n- train: 9,343 clients, 177,457 total examples\n- test: 9,343 clients, 22,831 total examples\n\nThe `tf.data.Datasets` returned by\n[`tff.simulation.datasets.ClientData.create_tf_dataset_for_client`](../../../../tff/simulation/datasets/ClientData#create_tf_dataset_for_client) will yield\n`collections.OrderedDict` objects at each iteration. These objects have a\nkey/value pair storing the image of the celebrity:\n\n- `'image'`: a [`tf.Tensor`](https://www.tensorflow.org/api_docs/python/tf/Tensor) with `dtype=tf.int64` and shape \\[84, 84, 3\\], containing the red/blue/green pixels of the image. Each pixel is a value in the range \\[0, 255\\].\n\nThe OrderedDict objects also contain an additional 40 key/value pairs for the\ncelebrity image attributes, each of the format:\n\n- `{attribute name}`: a [`tf.Tensor`](https://www.tensorflow.org/api_docs/python/tf/Tensor) with `dtype=tf.bool` and shape \\[1\\], set to True if the celebrity has this attribute in the image, or False if they don't.\n\nThe attribute names are:\n'five_o_clock_shadow', 'arched_eyebrows', 'attractive', 'bags_under_eyes',\n'bald', 'bangs', 'big_lips', 'big_nose', 'black_hair', 'blond_hair',\n'blurry', 'brown_hair', 'bushy_eyebrows', 'chubby', 'double_chin',\n'eyeglasses', 'goatee', 'gray_hair', 'heavy_makeup', 'high_cheekbones',\n'male', 'mouth_slightly_open', 'mustache', 'narrow_eyes', 'no_beard',\n'oval_face', 'pale_skin', 'pointy_nose', 'receding_hairline', 'rosy_cheeks',\n'sideburns', 'smiling', 'straight_hair', 'wavy_hair', 'wearing_earrings',\n'wearing_hat', 'wearing_lipstick', 'wearing_necklace', 'wearing_necktie',\n'young'\n| **Note:** The CelebA dataset may contain potential bias. The [fairness indicators TF tutorial](https://www.tensorflow.org/responsible_ai/fairness_indicators/tutorials/Fairness_Indicators_TFCO_CelebA_Case_Study) goes into detail about several considerations to keep in mind while using the CelebA dataset.\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n| Args ---- ||\n|--------------------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|\n| `split_by_clients` | There are 9,343 clients in the federated CelebA dataset with 5 or more examples. If this argument is True, clients are divided into train and test groups, with 8,408 and 935 clients respectively. If this argument is False, the data is divided by examples instead, i.e., all clients participate in both the train and test groups, with \\~90% of the examples belonging to the train group and the rest belonging to the test group. |\n| `cache_dir` | (Optional) directory to cache the downloaded file. If `None`, caches in Keras' default cache directory. |\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n| Returns ------- ||\n|---|---|\n| Tuple of `(train, test)` where the tuple elements are [`tff.simulation.datasets.ClientData`](../../../../tff/simulation/datasets/ClientData) objects. ||\n\n\u003cbr /\u003e"]]