TFDS CLI is a command-line tool that provides various commands to easily work with TensorFlow Datasets.
![]() |
![]() |
![]() |
![]() |
Disable TF logs on import
%%capture
%env TF_CPP_MIN_LOG_LEVEL=1 # Disable logs on TF import
Installation
The CLI tool is installed with tensorflow-datasets
(or tfds-nightly
).
pip install -q tfds-nightly apache-beam
tfds --version
For the list of all CLI commands:
tfds --help
2025-08-06 11:35:35.732457: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:467] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered WARNING: All log messages before absl::InitializeLog() is called are written to STDERR E0000 00:00:1754480135.757560 19536 cuda_dnn.cc:8579] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered E0000 00:00:1754480135.765883 19536 cuda_blas.cc:1407] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered W0000 00:00:1754480135.785416 19536 computation_placer.cc:177] computation placer already registered. Please check linkage and avoid linking the same target more than once. W0000 00:00:1754480135.785446 19536 computation_placer.cc:177] computation placer already registered. Please check linkage and avoid linking the same target more than once. W0000 00:00:1754480135.785450 19536 computation_placer.cc:177] computation placer already registered. Please check linkage and avoid linking the same target more than once. W0000 00:00:1754480135.785454 19536 computation_placer.cc:177] computation placer already registered. Please check linkage and avoid linking the same target more than once. 2025-08-06 11:35:35.790935: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations. To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags. Traceback (most recent call last): File "/tmpfs/src/tf_docs_env/bin/tfds", line 3, in <module> from tensorflow_datasets.scripts.cli.main import launch_cli File "/tmpfs/src/tf_docs_env/lib/python3.10/site-packages/tensorflow_datasets/scripts/cli/main.py", line 38, in <module> from tensorflow_datasets.scripts.cli import croissant File "/tmpfs/src/tf_docs_env/lib/python3.10/site-packages/tensorflow_datasets/scripts/cli/croissant.py", line 36, in <module> import mlcroissant as mlc ModuleNotFoundError: No module named 'mlcroissant'
tfds new
: Implementing a new Dataset
This command will help you kickstart writing your new Python dataset by creating
a <dataset_name>/
directory containing default implementation files.
Usage:
tfds new my_dataset
2025-08-06 11:35:41.044128: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:467] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered WARNING: All log messages before absl::InitializeLog() is called are written to STDERR E0000 00:00:1754480141.069301 19601 cuda_dnn.cc:8579] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered E0000 00:00:1754480141.077610 19601 cuda_blas.cc:1407] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered W0000 00:00:1754480141.097414 19601 computation_placer.cc:177] computation placer already registered. Please check linkage and avoid linking the same target more than once. W0000 00:00:1754480141.097447 19601 computation_placer.cc:177] computation placer already registered. Please check linkage and avoid linking the same target more than once. W0000 00:00:1754480141.097451 19601 computation_placer.cc:177] computation placer already registered. Please check linkage and avoid linking the same target more than once. W0000 00:00:1754480141.097455 19601 computation_placer.cc:177] computation placer already registered. Please check linkage and avoid linking the same target more than once. 2025-08-06 11:35:41.102835: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations. To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags. Traceback (most recent call last): File "/tmpfs/src/tf_docs_env/bin/tfds", line 3, in <module> from tensorflow_datasets.scripts.cli.main import launch_cli File "/tmpfs/src/tf_docs_env/lib/python3.10/site-packages/tensorflow_datasets/scripts/cli/main.py", line 38, in <module> from tensorflow_datasets.scripts.cli import croissant File "/tmpfs/src/tf_docs_env/lib/python3.10/site-packages/tensorflow_datasets/scripts/cli/croissant.py", line 36, in <module> import mlcroissant as mlc ModuleNotFoundError: No module named 'mlcroissant'
tfds new my_dataset
will create:
ls -1 my_dataset/
ls: cannot access 'my_dataset/': No such file or directory
An optional flag --data_format
can be used to generate format-specific dataset builders (e.g., conll
). If no data format is given, it will generate a template for a standard tfds.core.GeneratorBasedBuilder
.
Refer to the documentation for details on the available format-specific dataset builders.
See our writing dataset guide for more info.
Available options:
tfds new --help
2025-08-06 11:35:46.495741: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:467] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered WARNING: All log messages before absl::InitializeLog() is called are written to STDERR E0000 00:00:1754480146.521060 19667 cuda_dnn.cc:8579] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered E0000 00:00:1754480146.529467 19667 cuda_blas.cc:1407] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered W0000 00:00:1754480146.549199 19667 computation_placer.cc:177] computation placer already registered. Please check linkage and avoid linking the same target more than once. W0000 00:00:1754480146.549232 19667 computation_placer.cc:177] computation placer already registered. Please check linkage and avoid linking the same target more than once. W0000 00:00:1754480146.549237 19667 computation_placer.cc:177] computation placer already registered. Please check linkage and avoid linking the same target more than once. W0000 00:00:1754480146.549241 19667 computation_placer.cc:177] computation placer already registered. Please check linkage and avoid linking the same target more than once. 2025-08-06 11:35:46.554683: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations. To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags. Traceback (most recent call last): File "/tmpfs/src/tf_docs_env/bin/tfds", line 3, in <module> from tensorflow_datasets.scripts.cli.main import launch_cli File "/tmpfs/src/tf_docs_env/lib/python3.10/site-packages/tensorflow_datasets/scripts/cli/main.py", line 38, in <module> from tensorflow_datasets.scripts.cli import croissant File "/tmpfs/src/tf_docs_env/lib/python3.10/site-packages/tensorflow_datasets/scripts/cli/croissant.py", line 36, in <module> import mlcroissant as mlc ModuleNotFoundError: No module named 'mlcroissant'
tfds build
: Download and prepare a dataset
Use tfds build <my_dataset>
to generate a new dataset. <my_dataset>
can be:
A path to
dataset/
folder ordataset.py
file (empty for current directory):tfds build datasets/my_dataset/
cd datasets/my_dataset/ && tfds build
cd datasets/my_dataset/ && tfds build my_dataset
cd datasets/my_dataset/ && tfds build my_dataset.py
A registered dataset:
tfds build mnist
tfds build my_dataset --imports my_project.datasets
Available options:
tfds build --help
2025-08-06 11:35:51.820859: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:467] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered WARNING: All log messages before absl::InitializeLog() is called are written to STDERR E0000 00:00:1754480151.846260 19732 cuda_dnn.cc:8579] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered E0000 00:00:1754480151.854680 19732 cuda_blas.cc:1407] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered W0000 00:00:1754480151.874577 19732 computation_placer.cc:177] computation placer already registered. Please check linkage and avoid linking the same target more than once. W0000 00:00:1754480151.874609 19732 computation_placer.cc:177] computation placer already registered. Please check linkage and avoid linking the same target more than once. W0000 00:00:1754480151.874614 19732 computation_placer.cc:177] computation placer already registered. Please check linkage and avoid linking the same target more than once. W0000 00:00:1754480151.874617 19732 computation_placer.cc:177] computation placer already registered. Please check linkage and avoid linking the same target more than once. 2025-08-06 11:35:51.880178: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations. To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags. Traceback (most recent call last): File "/tmpfs/src/tf_docs_env/bin/tfds", line 3, in <module> from tensorflow_datasets.scripts.cli.main import launch_cli File "/tmpfs/src/tf_docs_env/lib/python3.10/site-packages/tensorflow_datasets/scripts/cli/main.py", line 38, in <module> from tensorflow_datasets.scripts.cli import croissant File "/tmpfs/src/tf_docs_env/lib/python3.10/site-packages/tensorflow_datasets/scripts/cli/croissant.py", line 36, in <module> import mlcroissant as mlc ModuleNotFoundError: No module named 'mlcroissant'