ai2dcaption
Stay organized with collections
Save and categorize content based on your preferences.
This dataset is primarily based off the AI2D Dataset (see
here).
See Section 4.1 of our paper for the
AI2D-Caption dataset annotation process.
Split |
Examples |
'auditor_llm_training_examples' |
30 |
'gpt4v' |
4,903 |
'llava_15' |
4,902 |
'planner_llm_training_examples' |
30 |
'test' |
75 |
FeaturesDict({
'caption': Text(shape=(), dtype=string),
'entities': Sequence({
'bounds': BBoxFeature(shape=(4,), dtype=float32),
'cat': ClassLabel(shape=(), dtype=int64, num_classes=10),
'from': Text(shape=(), dtype=string),
'id': Text(shape=(), dtype=string),
'label': Text(shape=(), dtype=string),
'to': Text(shape=(), dtype=string),
'type': ClassLabel(shape=(), dtype=int64, num_classes=5),
}),
'image': Image(shape=(None, None, 3), dtype=uint8, description=The image of the diagram.),
'image_filename': Text(shape=(), dtype=string),
'layout': ClassLabel(shape=(), dtype=int64, num_classes=7),
'relationships': Sequence(Text(shape=(), dtype=string)),
'topic': ClassLabel(shape=(), dtype=int64, num_classes=4),
})
Feature |
Class |
Shape |
Dtype |
Description |
|
FeaturesDict |
|
|
|
caption |
Text |
|
string |
|
entities |
Sequence |
|
|
|
entities/bounds |
BBoxFeature |
(4,) |
float32 |
|
entities/cat |
ClassLabel |
|
int64 |
|
entities/from |
Text |
|
string |
|
entities/id |
Text |
|
string |
|
entities/label |
Text |
|
string |
|
entities/to |
Text |
|
string |
|
entities/type |
ClassLabel |
|
int64 |
|
image
|
Image
|
(None, None,
3) |
uint8
|
The image of the
diagram. |
image_filename
|
Text
|
|
string
|
Image filename.
e.g. "1337.png" |
layout |
ClassLabel |
|
int64 |
|
relationships |
Sequence(Text) |
(None,) |
string |
|
topic |
ClassLabel |
|
int64 |
|

@inproceedings{Zala2024DiagrammerGPT,
author = {Abhay Zala and Han Lin and Jaemin Cho and Mohit Bansal},
title = {DiagrammerGPT: Generating Open-Domain, Open-Platform Diagrams via LLM Planning},
year = {2024},
booktitle = {COLM},
}
Except as otherwise noted, the content of this page is licensed under the Creative Commons Attribution 4.0 License, and code samples are licensed under the Apache 2.0 License. For details, see the Google Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.
Last updated 2025-03-14 UTC.
[[["Easy to understand","easyToUnderstand","thumb-up"],["Solved my problem","solvedMyProblem","thumb-up"],["Other","otherUp","thumb-up"]],[["Missing the information I need","missingTheInformationINeed","thumb-down"],["Too complicated / too many steps","tooComplicatedTooManySteps","thumb-down"],["Out of date","outOfDate","thumb-down"],["Samples / code issue","samplesCodeIssue","thumb-down"],["Other","otherDown","thumb-down"]],["Last updated 2025-03-14 UTC."],[],[],null,["# ai2dcaption\n\n\u003cbr /\u003e\n\n- **Description**:\n\nThis dataset is primarily based off the AI2D Dataset (see\n[here](https://prior.allenai.org/projects/diagram-understanding)).\n\nSee [Section 4.1](https://arxiv.org/pdf/2310.12128) of our paper for the\nAI2D-Caption dataset annotation process.\n\n- **Homepage** :\n \u003chttps://huggingface.co/datasets/abhayzala/AI2D-Caption\u003e\n\n- **Source code** :\n [`tfds.datasets.ai2dcaption.Builder`](https://github.com/tensorflow/datasets/tree/master/tensorflow_datasets/datasets/ai2dcaption/ai2dcaption_dataset_builder.py)\n\n- **Versions**:\n\n - **`1.0.0`** (default): Initial release.\n- **Download size** : `Unknown size`\n\n- **Dataset size** : `2.01 GiB`\n\n- **Auto-cached**\n ([documentation](https://www.tensorflow.org/datasets/performances#auto-caching)):\n No\n\n- **Splits**:\n\n| Split | Examples |\n|-----------------------------------|----------|\n| `'auditor_llm_training_examples'` | 30 |\n| `'gpt4v'` | 4,903 |\n| `'llava_15'` | 4,902 |\n| `'planner_llm_training_examples'` | 30 |\n| `'test'` | 75 |\n\n- **Feature structure**:\n\n FeaturesDict({\n 'caption': Text(shape=(), dtype=string),\n 'entities': Sequence({\n 'bounds': BBoxFeature(shape=(4,), dtype=float32),\n 'cat': ClassLabel(shape=(), dtype=int64, num_classes=10),\n 'from': Text(shape=(), dtype=string),\n 'id': Text(shape=(), dtype=string),\n 'label': Text(shape=(), dtype=string),\n 'to': Text(shape=(), dtype=string),\n 'type': ClassLabel(shape=(), dtype=int64, num_classes=5),\n }),\n 'image': Image(shape=(None, None, 3), dtype=uint8, description=The image of the diagram.),\n 'image_filename': Text(shape=(), dtype=string),\n 'layout': ClassLabel(shape=(), dtype=int64, num_classes=7),\n 'relationships': Sequence(Text(shape=(), dtype=string)),\n 'topic': ClassLabel(shape=(), dtype=int64, num_classes=4),\n })\n\n- **Feature documentation**:\n\n| Feature | Class | Shape | Dtype | Description |\n|-----------------|----------------|-----------------|---------|---------------------------------|\n| | FeaturesDict | | | |\n| caption | Text | | string | |\n| entities | Sequence | | | |\n| entities/bounds | BBoxFeature | (4,) | float32 | |\n| entities/cat | ClassLabel | | int64 | |\n| entities/from | Text | | string | |\n| entities/id | Text | | string | |\n| entities/label | Text | | string | |\n| entities/to | Text | | string | |\n| entities/type | ClassLabel | | int64 | |\n| image | Image | (None, None, 3) | uint8 | The image of the diagram. |\n| image_filename | Text | | string | Image filename. e.g. \"1337.png\" |\n| layout | ClassLabel | | int64 | |\n| relationships | Sequence(Text) | (None,) | string | |\n| topic | ClassLabel | | int64 | |\n\n- **Supervised keys** (See\n [`as_supervised` doc](https://www.tensorflow.org/datasets/api_docs/python/tfds/load#args)):\n `None`\n\n- **Figure**\n ([tfds.show_examples](https://www.tensorflow.org/datasets/api_docs/python/tfds/visualization/show_examples)):\n\n- **Examples** ([tfds.as_dataframe](https://www.tensorflow.org/datasets/api_docs/python/tfds/as_dataframe)):\n\nDisplay examples... \n\n- **Citation**:\n\n @inproceedings{Zala2024DiagrammerGPT,\n author = {Abhay Zala and Han Lin and Jaemin Cho and Mohit Bansal},\n title = {DiagrammerGPT: Generating Open-Domain, Open-Platform Diagrams via LLM Planning},\n year = {2024},\n booktitle = {COLM},\n }"]]