Stay organized with collections
Save and categorize content based on your preferences.
The tfds.decode API allows you override the default feature decoding. The main
use case is to skip the image decoding for better performance.
Usage examples
Skipping the image decoding
To keep full control over the decoding pipeline, or to apply a filter before the
images get decoded (for better performance), you can skip the image decoding
entirely. This works with both tfds.features.Image and tfds.features.Video.
ds=tfds.load('imagenet2012',split='train',decoders={'image':tfds.decode.SkipDecoding(),})forexampleinds.take(1):assertexample['image'].dtype==tf.string# Images are not decoded
Filter/shuffle dataset before images get decoded
Similarly to the previous example, you can use tfds.decode.SkipDecoding() to
insert additional tf.data pipeline customization before decoding the image.
That way the filtered images won't be decoded and you can use a bigger shuffle
buffer.
# Load the base dataset without decodingds,ds_info=tfds.load('imagenet2012',split='train',decoders={'image':tfds.decode.SkipDecoding(),# Image won't be decoded here},as_supervised=True,with_info=True,)# Apply filter and shuffleds=ds.filter(lambdaimage,label:label!=10)ds=ds.shuffle(10000)# Then decode with ds_info.features['image']ds=ds.map(lambdaimage,label:ds_info.features['image'].decode_example(image),label)
@tfds.decode.make_decoder()defdecode_example(serialized_image,feature):crop_y,crop_x,crop_height,crop_width=10,10,64,64returntf.image.decode_and_crop_jpeg(serialized_image,[crop_y,crop_x,crop_height,crop_width],channels=feature.feature.shape[-1],)ds=tfds.load('imagenet2012',split='train',decoders={# With video, decoders are applied to individual frames'image':decode_example(),})
Video are Sequence(Image()). When applying custom decoders, they will be
applied to individual frames. This mean decoders for images are automatically
compatible with video.
@tfds.decode.make_decoder()defdecode_example(serialized_image,feature):crop_y,crop_x,crop_height,crop_width=10,10,64,64returntf.image.decode_and_crop_jpeg(serialized_image,[crop_y,crop_x,crop_height,crop_width],channels=feature.feature.shape[-1],)ds=tfds.load('ucf101',split='train',decoders={# With video, decoders are applied to individual frames'video':decode_example(),})
Which is equivalent to:
defdecode_frame(serialized_image):"""Decodes a single frame."""crop_y,crop_x,crop_height,crop_width=10,10,64,64returntf.image.decode_and_crop_jpeg(serialized_image,[crop_y,crop_x,crop_height,crop_width],channels=ds_info.features['video'].shape[-1],)defdecode_video(example):"""Decodes all individual frames of the video."""video=example['video']video=tf.map_fn(decode_frame,video,dtype=ds_info.features['video'].dtype,parallel_iterations=10,)example['video']=videoreturnexampleds,ds_info=tfds.load('ucf101',split='train',with_info=True,decoders={'video':tfds.decode.SkipDecoding(),# Skip frame decoding})ds=ds.map(decode_video)# Decode the video
Only decode a sub-set of the features.
It's also possible to entirely skip some features by specifying only the
features you need. All other features will be ignored/skipped.
TFDS will select the subset of builder.info.features matching the given
tfds.decode.PartialDecoding structure.
In the above code, the featured are implicitly extracted to match
builder.info.features. It is also possible to explicitly define the features.
The above code is equivalent to:
[[["Easy to understand","easyToUnderstand","thumb-up"],["Solved my problem","solvedMyProblem","thumb-up"],["Other","otherUp","thumb-up"]],[["Missing the information I need","missingTheInformationINeed","thumb-down"],["Too complicated / too many steps","tooComplicatedTooManySteps","thumb-down"],["Out of date","outOfDate","thumb-down"],["Samples / code issue","samplesCodeIssue","thumb-down"],["Other","otherDown","thumb-down"]],["Last updated 2021-12-16 UTC."],[],[],null,["# Customizing feature decoding\n\n\u003cbr /\u003e\n\nThe [`tfds.decode`](https://www.tensorflow.org/datasets/api_docs/python/tfds/decode) API allows you override the default feature decoding. The main\nuse case is to skip the image decoding for better performance.\n| **Note:** This API gives you access to the low-level [`tf.train.Example`](https://www.tensorflow.org/api_docs/python/tf/train/Example) format on disk (as defined by the `FeatureConnector`). This API is targeted towards advanced users who want better read performance with images.\n\nUsage examples\n--------------\n\n### Skipping the image decoding\n\nTo keep full control over the decoding pipeline, or to apply a filter before the\nimages get decoded (for better performance), you can skip the image decoding\nentirely. This works with both [`tfds.features.Image`](https://www.tensorflow.org/datasets/api_docs/python/tfds/features/Image) and [`tfds.features.Video`](https://www.tensorflow.org/datasets/api_docs/python/tfds/features/Video). \n\n ds = tfds.load('imagenet2012', split='train', decoders={\n 'image': tfds.decode.SkipDecoding(),\n })\n\n for example in ds.take(1):\n assert example['image'].dtype == tf.string # Images are not decoded\n\n### Filter/shuffle dataset before images get decoded\n\nSimilarly to the previous example, you can use [`tfds.decode.SkipDecoding()`](https://www.tensorflow.org/datasets/api_docs/python/tfds/decode/SkipDecoding) to\ninsert additional [`tf.data`](https://www.tensorflow.org/api_docs/python/tf/data) pipeline customization before decoding the image.\nThat way the filtered images won't be decoded and you can use a bigger shuffle\nbuffer. \n\n # Load the base dataset without decoding\n ds, ds_info = tfds.load(\n 'imagenet2012',\n split='train',\n decoders={\n 'image': tfds.decode.SkipDecoding(), # Image won't be decoded here\n },\n as_supervised=True,\n with_info=True,\n )\n # Apply filter and shuffle\n ds = ds.filter(lambda image, label: label != 10)\n ds = ds.shuffle(10000)\n # Then decode with ds_info.features['image']\n ds = ds.map(\n lambda image, label: ds_info.features['image'].decode_example(image), label)\n\n### Cropping and decoding at the same time\n\nTo override the default [`tf.io.decode_image`](https://www.tensorflow.org/api_docs/python/tf/io/decode_image) operation, you can create a new\n[`tfds.decode.Decoder`](https://www.tensorflow.org/datasets/api_docs/python/tfds/decode/Decoder) object using the [`tfds.decode.make_decoder()`](https://www.tensorflow.org/datasets/api_docs/python/tfds/decode/make_decoder) decorator. \n\n @tfds.decode.make_decoder()\n def decode_example(serialized_image, feature):\n crop_y, crop_x, crop_height, crop_width = 10, 10, 64, 64\n return tf.image.decode_and_crop_jpeg(\n serialized_image,\n [crop_y, crop_x, crop_height, crop_width],\n channels=feature.feature.shape[-1],\n )\n\n ds = tfds.load('imagenet2012', split='train', decoders={\n # With video, decoders are applied to individual frames\n 'image': decode_example(),\n })\n\nWhich is equivalent to: \n\n def decode_example(serialized_image, feature):\n crop_y, crop_x, crop_height, crop_width = 10, 10, 64, 64\n return tf.image.decode_and_crop_jpeg(\n serialized_image,\n [crop_y, crop_x, crop_height, crop_width],\n channels=feature.shape[-1],\n )\n\n ds, ds_info = tfds.load(\n 'imagenet2012',\n split='train',\n with_info=True,\n decoders={\n 'image': tfds.decode.SkipDecoding(), # Skip frame decoding\n },\n )\n ds = ds.map(functools.partial(decode_example, feature=ds_info.features['image']))\n\n### Customizing video decoding\n\nVideo are `Sequence(Image())`. When applying custom decoders, they will be\napplied to individual frames. This mean decoders for images are automatically\ncompatible with video. \n\n @tfds.decode.make_decoder()\n def decode_example(serialized_image, feature):\n crop_y, crop_x, crop_height, crop_width = 10, 10, 64, 64\n return tf.image.decode_and_crop_jpeg(\n serialized_image,\n [crop_y, crop_x, crop_height, crop_width],\n channels=feature.feature.shape[-1],\n )\n\n ds = tfds.load('ucf101', split='train', decoders={\n # With video, decoders are applied to individual frames\n 'video': decode_example(),\n })\n\nWhich is equivalent to: \n\n def decode_frame(serialized_image):\n \"\"\"Decodes a single frame.\"\"\"\n crop_y, crop_x, crop_height, crop_width = 10, 10, 64, 64\n return tf.image.decode_and_crop_jpeg(\n serialized_image,\n [crop_y, crop_x, crop_height, crop_width],\n channels=ds_info.features['video'].shape[-1],\n )\n\n\n def decode_video(example):\n \"\"\"Decodes all individual frames of the video.\"\"\"\n video = example['video']\n video = tf.map_fn(\n decode_frame,\n video,\n dtype=ds_info.features['video'].dtype,\n parallel_iterations=10,\n )\n example['video'] = video\n return example\n\n\n ds, ds_info = tfds.load('ucf101', split='train', with_info=True, decoders={\n 'video': tfds.decode.SkipDecoding(), # Skip frame decoding\n })\n ds = ds.map(decode_video) # Decode the video\n\n### Only decode a sub-set of the features.\n\nIt's also possible to entirely skip some features by specifying only the\nfeatures you need. All other features will be ignored/skipped. \n\n builder = tfds.builder('my_dataset')\n builder.as_dataset(split='train', decoders=tfds.decode.PartialDecoding({\n 'image': True,\n 'metadata': {'num_objects', 'scene_name'},\n 'objects': {'label'},\n })\n\nTFDS will select the subset of `builder.info.features` matching the given\n[`tfds.decode.PartialDecoding`](https://www.tensorflow.org/datasets/api_docs/python/tfds/decode/PartialDecoding) structure.\n\nIn the above code, the featured are implicitly extracted to match\n`builder.info.features`. It is also possible to explicitly define the features.\nThe above code is equivalent to: \n\n builder = tfds.builder('my_dataset')\n builder.as_dataset(split='train', decoders=tfds.decode.PartialDecoding({\n 'image': tfds.features.Image(),\n 'metadata': {\n 'num_objects': tf.int64,\n 'scene_name': tfds.features.Text(),\n },\n 'objects': tfds.features.Sequence({\n 'label': tfds.features.ClassLabel(names=[]),\n }),\n })\n\nThe original metadata (label names, image shape,...) are automatically reused so\nit's not required to provide them.\n\n[`tfds.decode.SkipDecoding`](https://www.tensorflow.org/datasets/api_docs/python/tfds/decode/SkipDecoding) can be passed to [`tfds.decode.PartialDecoding`](https://www.tensorflow.org/datasets/api_docs/python/tfds/decode/PartialDecoding),\nthrough the `PartialDecoding(..., decoders={})` kwargs."]]