tao
Stay organized with collections
Save and categorize content based on your preferences.
Warning: Manual download required. See instructions below.
The TAO dataset is a large video object detection dataset consisting of 2,907
high resolution videos and 833 object categories. Note that this dataset
requires at least 300 GB of free space to store.
Additional Documentation :
Explore on Papers With Code
north_east
Homepage : https://taodataset.org/
Source code :
tfds.video.tao.Tao
Versions :
1.1.0
(default): Added test split.
Download size : Unknown size
Dataset size : Unknown size
Manual download instructions : This dataset requires you to
download the source data manually into download_config.manual_dir
(defaults to ~/tensorflow_datasets/downloads/manual/
):
Some TAO files (HVACS and AVA videos) must be manually downloaded because
a login to MOT is required. Please download and those data following
the instructions at https://motchallenge.net/tao_download.php
Download this data and move the resulting .zip files to
~/tensorflow_datasets/downloads/manual/
If the data requiring manual download is not present, it will be skipped over
and only the data not requiring manual download will be used.
@article { Dave_2020 ,
title = { TAO : A Large - Scale Benchmark for Tracking Any Object } ,
ISBN = { 9783030585587 } ,
ISSN = { 1611 - 3349 } ,
url = { http : // dx . doi . org / 10.1007 / 978 - 3 - 030 - 58558 - 7 _26 } ,
DOI = { 10.1007 / 978 - 3 - 030 - 58558 - 7 _26 } ,
journal = { Lecture Notes in Computer Science } ,
publisher = { Springer International Publishing } ,
author = { Dave , Achal and Khurana , Tarasha and Tokmakov , Pavel and Schmid , Cordelia and Ramanan , Deva } ,
year = { 2020 } ,
pages = { 436 - 454 }
}
tao/480_640 (default config)
FeaturesDict ({
'metadata' : FeaturesDict ({
'dataset' : string ,
'height' : int32 ,
'neg_category_ids' : Tensor ( shape = ( None ,), dtype = int32 ),
'not_exhaustive_category_ids' : Tensor ( shape = ( None ,), dtype = int32 ),
'num_frames' : int32 ,
'video_name' : string ,
'width' : int32 ,
}),
'tracks' : Sequence ({
'bboxes' : Sequence ( BBoxFeature ( shape = ( 4 ,), dtype = float32 )),
'category' : ClassLabel ( shape = (), dtype = int64 , num_classes = 363 ),
'frames' : Sequence ( int32 ),
'is_crowd' : bool ,
'scale_category' : string ,
'track_id' : int32 ,
}),
'video' : Video ( Image ( shape = ( 480 , 640 , 3 ), dtype = uint8 )),
})
Feature
Class
Shape
Dtype
Description
FeaturesDict
metadata
FeaturesDict
metadata/dataset
Tensor
string
metadata/height
Tensor
int32
metadata/neg_category_ids
Tensor
(None,)
int32
metadata/not_exhaustive_category_ids
Tensor
(None,)
int32
metadata/num_frames
Tensor
int32
metadata/video_name
Tensor
string
metadata/width
Tensor
int32
tracks
Sequence
tracks/bboxes
Sequence(BBoxFeature)
(None, 4)
float32
tracks/category
ClassLabel
int64
tracks/frames
Sequence(Tensor)
(None,)
int32
tracks/is_crowd
Tensor
bool
tracks/scale_category
Tensor
string
tracks/track_id
Tensor
int32
video
Video(Image)
(None, 480, 640, 3)
uint8
tao/full_resolution
FeaturesDict ({
'metadata' : FeaturesDict ({
'dataset' : string ,
'height' : int32 ,
'neg_category_ids' : Tensor ( shape = ( None ,), dtype = int32 ),
'not_exhaustive_category_ids' : Tensor ( shape = ( None ,), dtype = int32 ),
'num_frames' : int32 ,
'video_name' : string ,
'width' : int32 ,
}),
'tracks' : Sequence ({
'bboxes' : Sequence ( BBoxFeature ( shape = ( 4 ,), dtype = float32 )),
'category' : ClassLabel ( shape = (), dtype = int64 , num_classes = 363 ),
'frames' : Sequence ( int32 ),
'is_crowd' : bool ,
'scale_category' : string ,
'track_id' : int32 ,
}),
'video' : Video ( Image ( shape = ( None , None , 3 ), dtype = uint8 )),
})
Feature
Class
Shape
Dtype
Description
FeaturesDict
metadata
FeaturesDict
metadata/dataset
Tensor
string
metadata/height
Tensor
int32
metadata/neg_category_ids
Tensor
(None,)
int32
metadata/not_exhaustive_category_ids
Tensor
(None,)
int32
metadata/num_frames
Tensor
int32
metadata/video_name
Tensor
string
metadata/width
Tensor
int32
tracks
Sequence
tracks/bboxes
Sequence(BBoxFeature)
(None, 4)
float32
tracks/category
ClassLabel
int64
tracks/frames
Sequence(Tensor)
(None,)
int32
tracks/is_crowd
Tensor
bool
tracks/scale_category
Tensor
string
tracks/track_id
Tensor
int32
video
Video(Image)
(None, None, None, 3)
uint8
Except as otherwise noted, the content of this page is licensed under the Creative Commons Attribution 4.0 License , and code samples are licensed under the Apache 2.0 License . For details, see the Google Developers Site Policies . Java is a registered trademark of Oracle and/or its affiliates.
Last updated 2025-03-14 UTC.
[[["Easy to understand","easyToUnderstand","thumb-up"],["Solved my problem","solvedMyProblem","thumb-up"],["Other","otherUp","thumb-up"]],[["Missing the information I need","missingTheInformationINeed","thumb-down"],["Too complicated / too many steps","tooComplicatedTooManySteps","thumb-down"],["Out of date","outOfDate","thumb-down"],["Samples / code issue","samplesCodeIssue","thumb-down"],["Other","otherDown","thumb-down"]],["Last updated 2025-03-14 UTC."],[],[],null,["# tao\n\n\u003cbr /\u003e\n\n| **Warning:** Manual download required. See instructions below.\n\n- **Description**:\n\nThe TAO dataset is a large video object detection dataset consisting of 2,907\nhigh resolution videos and 833 object categories. Note that this dataset\nrequires at least 300 GB of free space to store.\n\n- **Additional Documentation** :\n [Explore on Papers With Code\n north_east](https://paperswithcode.com/dataset/tao)\n\n- **Homepage** : \u003chttps://taodataset.org/\u003e\n\n- **Source code** :\n [`tfds.video.tao.Tao`](https://github.com/tensorflow/datasets/tree/master/tensorflow_datasets/video/tao/tao.py)\n\n- **Versions**:\n\n - **`1.1.0`** (default): Added test split.\n- **Download size** : `Unknown size`\n\n- **Dataset size** : `Unknown size`\n\n- **Manual download instructions** : This dataset requires you to\n download the source data manually into `download_config.manual_dir`\n (defaults to `~/tensorflow_datasets/downloads/manual/`): \n\n Some TAO files (HVACS and AVA videos) must be manually downloaded because\n a login to MOT is required. Please download and those data following\n the instructions at \u003chttps://motchallenge.net/tao_download.php\u003e\n\nDownload this data and move the resulting .zip files to\n\\~/tensorflow_datasets/downloads/manual/\n\nIf the data requiring manual download is not present, it will be skipped over\nand only the data not requiring manual download will be used.\n\n- **Auto-cached**\n ([documentation](https://www.tensorflow.org/datasets/performances#auto-caching)):\n Unknown\n\n- **Splits**:\n\n| Split | Examples |\n|-------|----------|\n\n- **Supervised keys** (See\n [`as_supervised` doc](https://www.tensorflow.org/datasets/api_docs/python/tfds/load#args)):\n `None`\n\n- **Figure**\n ([tfds.show_examples](https://www.tensorflow.org/datasets/api_docs/python/tfds/visualization/show_examples)):\n Not supported.\n\n- **Examples**\n ([tfds.as_dataframe](https://www.tensorflow.org/datasets/api_docs/python/tfds/as_dataframe)):\n Missing.\n\n- **Citation**:\n\n @article{Dave_2020,\n title={TAO: A Large-Scale Benchmark for Tracking Any Object},\n ISBN={9783030585587},\n ISSN={1611-3349},\n url={http://dx.doi.org/10.1007/978-3-030-58558-7_26},\n DOI={10.1007/978-3-030-58558-7_26},\n journal={Lecture Notes in Computer Science},\n publisher={Springer International Publishing},\n author={Dave, Achal and Khurana, Tarasha and Tokmakov, Pavel and Schmid, Cordelia and Ramanan, Deva},\n year={2020},\n pages={436-454}\n }\n\ntao/480_640 (default config)\n----------------------------\n\n- **Config description**: All images are bilinearly resized to 480 X 640\n\n- **Feature structure**:\n\n FeaturesDict({\n 'metadata': FeaturesDict({\n 'dataset': string,\n 'height': int32,\n 'neg_category_ids': Tensor(shape=(None,), dtype=int32),\n 'not_exhaustive_category_ids': Tensor(shape=(None,), dtype=int32),\n 'num_frames': int32,\n 'video_name': string,\n 'width': int32,\n }),\n 'tracks': Sequence({\n 'bboxes': Sequence(BBoxFeature(shape=(4,), dtype=float32)),\n 'category': ClassLabel(shape=(), dtype=int64, num_classes=363),\n 'frames': Sequence(int32),\n 'is_crowd': bool,\n 'scale_category': string,\n 'track_id': int32,\n }),\n 'video': Video(Image(shape=(480, 640, 3), dtype=uint8)),\n })\n\n- **Feature documentation**:\n\n| Feature | Class | Shape | Dtype | Description |\n|--------------------------------------|-----------------------|---------------------|---------|-------------|\n| | FeaturesDict | | | |\n| metadata | FeaturesDict | | | |\n| metadata/dataset | Tensor | | string | |\n| metadata/height | Tensor | | int32 | |\n| metadata/neg_category_ids | Tensor | (None,) | int32 | |\n| metadata/not_exhaustive_category_ids | Tensor | (None,) | int32 | |\n| metadata/num_frames | Tensor | | int32 | |\n| metadata/video_name | Tensor | | string | |\n| metadata/width | Tensor | | int32 | |\n| tracks | Sequence | | | |\n| tracks/bboxes | Sequence(BBoxFeature) | (None, 4) | float32 | |\n| tracks/category | ClassLabel | | int64 | |\n| tracks/frames | Sequence(Tensor) | (None,) | int32 | |\n| tracks/is_crowd | Tensor | | bool | |\n| tracks/scale_category | Tensor | | string | |\n| tracks/track_id | Tensor | | int32 | |\n| video | Video(Image) | (None, 480, 640, 3) | uint8 | |\n\ntao/full_resolution\n-------------------\n\n- **Config description**: The full resolution version of the dataset.\n\n- **Feature structure**:\n\n FeaturesDict({\n 'metadata': FeaturesDict({\n 'dataset': string,\n 'height': int32,\n 'neg_category_ids': Tensor(shape=(None,), dtype=int32),\n 'not_exhaustive_category_ids': Tensor(shape=(None,), dtype=int32),\n 'num_frames': int32,\n 'video_name': string,\n 'width': int32,\n }),\n 'tracks': Sequence({\n 'bboxes': Sequence(BBoxFeature(shape=(4,), dtype=float32)),\n 'category': ClassLabel(shape=(), dtype=int64, num_classes=363),\n 'frames': Sequence(int32),\n 'is_crowd': bool,\n 'scale_category': string,\n 'track_id': int32,\n }),\n 'video': Video(Image(shape=(None, None, 3), dtype=uint8)),\n })\n\n- **Feature documentation**:\n\n| Feature | Class | Shape | Dtype | Description |\n|--------------------------------------|-----------------------|-----------------------|---------|-------------|\n| | FeaturesDict | | | |\n| metadata | FeaturesDict | | | |\n| metadata/dataset | Tensor | | string | |\n| metadata/height | Tensor | | int32 | |\n| metadata/neg_category_ids | Tensor | (None,) | int32 | |\n| metadata/not_exhaustive_category_ids | Tensor | (None,) | int32 | |\n| metadata/num_frames | Tensor | | int32 | |\n| metadata/video_name | Tensor | | string | |\n| metadata/width | Tensor | | int32 | |\n| tracks | Sequence | | | |\n| tracks/bboxes | Sequence(BBoxFeature) | (None, 4) | float32 | |\n| tracks/category | ClassLabel | | int64 | |\n| tracks/frames | Sequence(Tensor) | (None,) | int32 | |\n| tracks/is_crowd | Tensor | | bool | |\n| tracks/scale_category | Tensor | | string | |\n| tracks/track_id | Tensor | | int32 | |\n| video | Video(Image) | (None, None, None, 3) | uint8 | |"]]