clevr
Stay organized with collections
Save and categorize content based on your preferences.
CLEVR is a diagnostic dataset that tests a range of visual reasoning abilities.
It contains minimal biases and has detailed annotations describing the kind of
reasoning each question requires.
Split |
Examples |
'test' |
15,000 |
'train' |
70,000 |
'validation' |
15,000 |
FeaturesDict({
'file_name': Text(shape=(), dtype=string),
'image': Image(shape=(None, None, 3), dtype=uint8),
'objects': Sequence({
'3d_coords': Tensor(shape=(3,), dtype=float32),
'color': ClassLabel(shape=(), dtype=int64, num_classes=8),
'material': ClassLabel(shape=(), dtype=int64, num_classes=2),
'pixel_coords': Tensor(shape=(3,), dtype=float32),
'rotation': float32,
'shape': ClassLabel(shape=(), dtype=int64, num_classes=3),
'size': ClassLabel(shape=(), dtype=int64, num_classes=2),
}),
'question_answer': Sequence({
'answer': Text(shape=(), dtype=string),
'question': Text(shape=(), dtype=string),
}),
})
Feature |
Class |
Shape |
Dtype |
Description |
|
FeaturesDict |
|
|
|
file_name |
Text |
|
string |
|
image
|
Image
|
(None,
None, 3) |
uint8
|
|
objects |
Sequence |
|
|
|
objects/3d_coords |
Tensor |
(3,) |
float32 |
|
objects/color |
ClassLabel |
|
int64 |
|
objects/material |
ClassLabel |
|
int64 |
|
objects/pixel_coords |
Tensor |
(3,) |
float32 |
|
objects/rotation |
Tensor |
|
float32 |
|
objects/shape |
ClassLabel |
|
int64 |
|
objects/size |
ClassLabel |
|
int64 |
|
question_answer |
Sequence |
|
|
|
question_answer/answer |
Text |
|
string |
|
question_answer/question |
Text |
|
string |
|

@inproceedings{johnson2017clevr,
title={ {CLEVR}: A diagnostic dataset for compositional language and elementary visual reasoning},
author={Johnson, Justin and Hariharan, Bharath and van der Maaten, Laurens and Fei-Fei, Li and Lawrence Zitnick, C and Girshick, Ross},
booktitle={Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition},
year={2017}
}
Except as otherwise noted, the content of this page is licensed under the Creative Commons Attribution 4.0 License, and code samples are licensed under the Apache 2.0 License. For details, see the Google Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.
Last updated 2024-06-01 UTC.
[[["Easy to understand","easyToUnderstand","thumb-up"],["Solved my problem","solvedMyProblem","thumb-up"],["Other","otherUp","thumb-up"]],[["Missing the information I need","missingTheInformationINeed","thumb-down"],["Too complicated / too many steps","tooComplicatedTooManySteps","thumb-down"],["Out of date","outOfDate","thumb-down"],["Samples / code issue","samplesCodeIssue","thumb-down"],["Other","otherDown","thumb-down"]],["Last updated 2024-06-01 UTC."],[],[],null,["# clevr\n\n\u003cbr /\u003e\n\n- **Description**:\n\nCLEVR is a diagnostic dataset that tests a range of visual reasoning abilities.\nIt contains minimal biases and has detailed annotations describing the kind of\nreasoning each question requires.\n\n- **Additional Documentation** :\n [Explore on Papers With Code\n north_east](https://paperswithcode.com/dataset/clevr)\n\n- **Homepage** :\n \u003chttps://cs.stanford.edu/people/jcjohns/clevr/\u003e\n\n- **Source code** :\n [`tfds.datasets.clevr.Builder`](https://github.com/tensorflow/datasets/tree/master/tensorflow_datasets/datasets/clevr/clevr_dataset_builder.py)\n\n- **Versions**:\n\n - `3.0.0`: No release notes.\n - **`3.1.0`** (default): Add question/answer text.\n- **Download size** : `17.72 GiB`\n\n- **Dataset size** : `17.75 GiB`\n\n- **Auto-cached**\n ([documentation](https://www.tensorflow.org/datasets/performances#auto-caching)):\n No\n\n- **Splits**:\n\n| Split | Examples |\n|----------------|----------|\n| `'test'` | 15,000 |\n| `'train'` | 70,000 |\n| `'validation'` | 15,000 |\n\n- **Feature structure**:\n\n FeaturesDict({\n 'file_name': Text(shape=(), dtype=string),\n 'image': Image(shape=(None, None, 3), dtype=uint8),\n 'objects': Sequence({\n '3d_coords': Tensor(shape=(3,), dtype=float32),\n 'color': ClassLabel(shape=(), dtype=int64, num_classes=8),\n 'material': ClassLabel(shape=(), dtype=int64, num_classes=2),\n 'pixel_coords': Tensor(shape=(3,), dtype=float32),\n 'rotation': float32,\n 'shape': ClassLabel(shape=(), dtype=int64, num_classes=3),\n 'size': ClassLabel(shape=(), dtype=int64, num_classes=2),\n }),\n 'question_answer': Sequence({\n 'answer': Text(shape=(), dtype=string),\n 'question': Text(shape=(), dtype=string),\n }),\n })\n\n- **Feature documentation**:\n\n| Feature | Class | Shape | Dtype | Description |\n|--------------------------|--------------|-----------------|---------|-------------|\n| | FeaturesDict | | | |\n| file_name | Text | | string | |\n| image | Image | (None, None, 3) | uint8 | |\n| objects | Sequence | | | |\n| objects/3d_coords | Tensor | (3,) | float32 | |\n| objects/color | ClassLabel | | int64 | |\n| objects/material | ClassLabel | | int64 | |\n| objects/pixel_coords | Tensor | (3,) | float32 | |\n| objects/rotation | Tensor | | float32 | |\n| objects/shape | ClassLabel | | int64 | |\n| objects/size | ClassLabel | | int64 | |\n| question_answer | Sequence | | | |\n| question_answer/answer | Text | | string | |\n| question_answer/question | Text | | string | |\n\n- **Supervised keys** (See\n [`as_supervised` doc](https://www.tensorflow.org/datasets/api_docs/python/tfds/load#args)):\n `None`\n\n- **Figure**\n ([tfds.show_examples](https://www.tensorflow.org/datasets/api_docs/python/tfds/visualization/show_examples)):\n\n- **Examples** ([tfds.as_dataframe](https://www.tensorflow.org/datasets/api_docs/python/tfds/as_dataframe)):\n\nDisplay examples... \n\n- **Citation**:\n\n @inproceedings{johnson2017clevr,\n title={ {CLEVR}: A diagnostic dataset for compositional language and elementary visual reasoning},\n author={Johnson, Justin and Hariharan, Bharath and van der Maaten, Laurens and Fei-Fei, Li and Lawrence Zitnick, C and Girshick, Ross},\n booktitle={Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition},\n year={2017}\n }"]]