TFDS CLI

TFDS CLI is a command-line tool that provides various commands to easily work with TensorFlow Datasets.

View on TensorFlow.org Run in Google Colab View source on GitHub Download notebook
Disable TF logs on import
%%capture
%env TF_CPP_MIN_LOG_LEVEL=1  # Disable logs on TF import

Installation

The CLI tool is installed with tensorflow-datasets (or tfds-nightly).

pip install -q tfds-nightly apache-beam
tfds --version

For the list of all CLI commands:

tfds --help
2025-08-06 11:35:35.732457: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:467] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
WARNING: All log messages before absl::InitializeLog() is called are written to STDERR
E0000 00:00:1754480135.757560   19536 cuda_dnn.cc:8579] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
E0000 00:00:1754480135.765883   19536 cuda_blas.cc:1407] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
W0000 00:00:1754480135.785416   19536 computation_placer.cc:177] computation placer already registered. Please check linkage and avoid linking the same target more than once.
W0000 00:00:1754480135.785446   19536 computation_placer.cc:177] computation placer already registered. Please check linkage and avoid linking the same target more than once.
W0000 00:00:1754480135.785450   19536 computation_placer.cc:177] computation placer already registered. Please check linkage and avoid linking the same target more than once.
W0000 00:00:1754480135.785454   19536 computation_placer.cc:177] computation placer already registered. Please check linkage and avoid linking the same target more than once.
2025-08-06 11:35:35.790935: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
Traceback (most recent call last):
  File "/tmpfs/src/tf_docs_env/bin/tfds", line 3, in <module>
    from tensorflow_datasets.scripts.cli.main import launch_cli
  File "/tmpfs/src/tf_docs_env/lib/python3.10/site-packages/tensorflow_datasets/scripts/cli/main.py", line 38, in <module>
    from tensorflow_datasets.scripts.cli import croissant
  File "/tmpfs/src/tf_docs_env/lib/python3.10/site-packages/tensorflow_datasets/scripts/cli/croissant.py", line 36, in <module>
    import mlcroissant as mlc
ModuleNotFoundError: No module named 'mlcroissant'

tfds new: Implementing a new Dataset

This command will help you kickstart writing your new Python dataset by creating a <dataset_name>/ directory containing default implementation files.

Usage:

tfds new my_dataset
2025-08-06 11:35:41.044128: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:467] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
WARNING: All log messages before absl::InitializeLog() is called are written to STDERR
E0000 00:00:1754480141.069301   19601 cuda_dnn.cc:8579] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
E0000 00:00:1754480141.077610   19601 cuda_blas.cc:1407] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
W0000 00:00:1754480141.097414   19601 computation_placer.cc:177] computation placer already registered. Please check linkage and avoid linking the same target more than once.
W0000 00:00:1754480141.097447   19601 computation_placer.cc:177] computation placer already registered. Please check linkage and avoid linking the same target more than once.
W0000 00:00:1754480141.097451   19601 computation_placer.cc:177] computation placer already registered. Please check linkage and avoid linking the same target more than once.
W0000 00:00:1754480141.097455   19601 computation_placer.cc:177] computation placer already registered. Please check linkage and avoid linking the same target more than once.
2025-08-06 11:35:41.102835: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
Traceback (most recent call last):
  File "/tmpfs/src/tf_docs_env/bin/tfds", line 3, in <module>
    from tensorflow_datasets.scripts.cli.main import launch_cli
  File "/tmpfs/src/tf_docs_env/lib/python3.10/site-packages/tensorflow_datasets/scripts/cli/main.py", line 38, in <module>
    from tensorflow_datasets.scripts.cli import croissant
  File "/tmpfs/src/tf_docs_env/lib/python3.10/site-packages/tensorflow_datasets/scripts/cli/croissant.py", line 36, in <module>
    import mlcroissant as mlc
ModuleNotFoundError: No module named 'mlcroissant'

tfds new my_dataset will create:

ls -1 my_dataset/
ls: cannot access 'my_dataset/': No such file or directory

An optional flag --data_format can be used to generate format-specific dataset builders (e.g., conll). If no data format is given, it will generate a template for a standard tfds.core.GeneratorBasedBuilder. Refer to the documentation for details on the available format-specific dataset builders.

See our writing dataset guide for more info.

Available options:

tfds new --help
2025-08-06 11:35:46.495741: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:467] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
WARNING: All log messages before absl::InitializeLog() is called are written to STDERR
E0000 00:00:1754480146.521060   19667 cuda_dnn.cc:8579] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
E0000 00:00:1754480146.529467   19667 cuda_blas.cc:1407] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
W0000 00:00:1754480146.549199   19667 computation_placer.cc:177] computation placer already registered. Please check linkage and avoid linking the same target more than once.
W0000 00:00:1754480146.549232   19667 computation_placer.cc:177] computation placer already registered. Please check linkage and avoid linking the same target more than once.
W0000 00:00:1754480146.549237   19667 computation_placer.cc:177] computation placer already registered. Please check linkage and avoid linking the same target more than once.
W0000 00:00:1754480146.549241   19667 computation_placer.cc:177] computation placer already registered. Please check linkage and avoid linking the same target more than once.
2025-08-06 11:35:46.554683: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
Traceback (most recent call last):
  File "/tmpfs/src/tf_docs_env/bin/tfds", line 3, in <module>
    from tensorflow_datasets.scripts.cli.main import launch_cli
  File "/tmpfs/src/tf_docs_env/lib/python3.10/site-packages/tensorflow_datasets/scripts/cli/main.py", line 38, in <module>
    from tensorflow_datasets.scripts.cli import croissant
  File "/tmpfs/src/tf_docs_env/lib/python3.10/site-packages/tensorflow_datasets/scripts/cli/croissant.py", line 36, in <module>
    import mlcroissant as mlc
ModuleNotFoundError: No module named 'mlcroissant'

tfds build: Download and prepare a dataset

Use tfds build <my_dataset> to generate a new dataset. <my_dataset> can be:

  • A path to dataset/ folder or dataset.py file (empty for current directory):

    • tfds build datasets/my_dataset/
    • cd datasets/my_dataset/ && tfds build
    • cd datasets/my_dataset/ && tfds build my_dataset
    • cd datasets/my_dataset/ && tfds build my_dataset.py
  • A registered dataset:

    • tfds build mnist
    • tfds build my_dataset --imports my_project.datasets

Available options:

tfds build --help
2025-08-06 11:35:51.820859: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:467] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
WARNING: All log messages before absl::InitializeLog() is called are written to STDERR
E0000 00:00:1754480151.846260   19732 cuda_dnn.cc:8579] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
E0000 00:00:1754480151.854680   19732 cuda_blas.cc:1407] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
W0000 00:00:1754480151.874577   19732 computation_placer.cc:177] computation placer already registered. Please check linkage and avoid linking the same target more than once.
W0000 00:00:1754480151.874609   19732 computation_placer.cc:177] computation placer already registered. Please check linkage and avoid linking the same target more than once.
W0000 00:00:1754480151.874614   19732 computation_placer.cc:177] computation placer already registered. Please check linkage and avoid linking the same target more than once.
W0000 00:00:1754480151.874617   19732 computation_placer.cc:177] computation placer already registered. Please check linkage and avoid linking the same target more than once.
2025-08-06 11:35:51.880178: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
Traceback (most recent call last):
  File "/tmpfs/src/tf_docs_env/bin/tfds", line 3, in <module>
    from tensorflow_datasets.scripts.cli.main import launch_cli
  File "/tmpfs/src/tf_docs_env/lib/python3.10/site-packages/tensorflow_datasets/scripts/cli/main.py", line 38, in <module>
    from tensorflow_datasets.scripts.cli import croissant
  File "/tmpfs/src/tf_docs_env/lib/python3.10/site-packages/tensorflow_datasets/scripts/cli/croissant.py", line 36, in <module>
    import mlcroissant as mlc
ModuleNotFoundError: No module named 'mlcroissant'