tfds.core.SequentialWriter
Stay organized with collections
Save and categorize content based on your preferences.
Class to write a TFDS dataset sequentially.
tfds.core.SequentialWriter(
ds_info: dataset_info.DatasetInfo,
max_examples_per_shard: int,
overwrite: bool = True,
file_format: str = 'tfrecord'
)
The SequentialWriter can be used to generate TFDS datasets by directly
appending TF Examples to the desired splits.
Once the user creates a SequentialWriter with a given DatasetInfo, they can
create splits, append examples to them, and close them whenever they are
finished.
Note that:
- Not closing a split may cause data to be lost.
- The examples are written to disk in the same order that they are given to
the writer.
- Since the SequentialWriter doesn't know how many examples are going to be
written, it can't estimate the optimal number of shards per split. Use the
max_examples_per_shard
parameter in the constructor to control how many
elements there should be per shard.
The datasets written with this writer can be read directly with
tfds.builder_from_directories
.
Example:
writer = SequentialWriter(ds_info=ds_info, max_examples_per_shard=1000)
writer.initialize_splits(['train', 'test'])
while (...):
# Code that generates the examples
writer.add_examples({'train': [example1, example2],
'test': [example3]})
...
writer.close_splits()
Args |
ds_info
|
DatasetInfo for this dataset.
|
max_examples_per_shard
|
maximum number of examples to write per shard.
|
overwrite
|
if True, it ignores and overwrites any existing data.
Otherwise, it loads the existing dataset and appends the new data (new
data will always be created as new shards).
|
file_format
|
An entry in file_adapters.FileFormat.
|
Methods
add_examples
View source
add_examples(
split_examples: Dict[str, List[Any]]
) -> None
Adds examples to the splits.
Args |
split_examples
|
dictionary of split_name :list_of_examples that includes
the list of examples that has to be added to each of the splits. Not all
the existing splits have to be in the dictionary
|
Raises |
KeyError
|
if any of the splits doesn't exist.
|
close_all
View source
close_all() -> None
Closes all the open splits.
close_splits
View source
close_splits(
splits: List[str]
) -> None
Closes the given list of splits.
Args |
splits
|
list of split names.
|
Raises |
KeyError
|
if any of the splits doesn't exist.
|
initialize_splits
View source
initialize_splits(
splits: List[str], fail_if_exists: bool = True
) -> None
Adds new splits to the dataset.
Args |
splits
|
list of split names to add.
|
fail_if_exists
|
will fail if this split already contains data.
|
Raises |
KeyError
|
if the split is already present.
|
Except as otherwise noted, the content of this page is licensed under the Creative Commons Attribution 4.0 License, and code samples are licensed under the Apache 2.0 License. For details, see the Google Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.
Last updated 2024-04-26 UTC.
[[["Easy to understand","easyToUnderstand","thumb-up"],["Solved my problem","solvedMyProblem","thumb-up"],["Other","otherUp","thumb-up"]],[["Missing the information I need","missingTheInformationINeed","thumb-down"],["Too complicated / too many steps","tooComplicatedTooManySteps","thumb-down"],["Out of date","outOfDate","thumb-down"],["Samples / code issue","samplesCodeIssue","thumb-down"],["Other","otherDown","thumb-down"]],["Last updated 2024-04-26 UTC."],[],[],null,["# tfds.core.SequentialWriter\n\n\u003cbr /\u003e\n\n|-------------------------------------------------------------------------------------------------------------------------------------|\n| [View source on GitHub](https://github.com/tensorflow/datasets/blob/v4.9.3/tensorflow_datasets/core/sequential_writer.py#L187-L369) |\n\nClass to write a TFDS dataset sequentially. \n\n tfds.core.SequentialWriter(\n ds_info: dataset_info.DatasetInfo,\n max_examples_per_shard: int,\n overwrite: bool = True,\n file_format: str = 'tfrecord'\n )\n\nThe SequentialWriter can be used to generate TFDS datasets by directly\nappending TF Examples to the desired splits.\n\nOnce the user creates a SequentialWriter with a given DatasetInfo, they can\ncreate splits, append examples to them, and close them whenever they are\nfinished.\n\n#### Note that:\n\n- Not closing a split may cause data to be lost.\n- The examples are written to disk in the same order that they are given to the writer.\n- Since the SequentialWriter doesn't know how many examples are going to be written, it can't estimate the optimal number of shards per split. Use the `max_examples_per_shard` parameter in the constructor to control how many elements there should be per shard.\n\nThe datasets written with this writer can be read directly with\n[`tfds.builder_from_directories`](../../tfds/builder_from_directories).\n\n#### Example:\n\nwriter = SequentialWriter(ds_info=ds_info, max_examples_per_shard=1000)\nwriter.initialize_splits(\\['train', 'test'\\])\n\nwhile (...):\n# Code that generates the examples\nwriter.add_examples({'train': \\[example1, example2\\],\n'test': \\[example3\\]})\n...\n\nwriter.close_splits()\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n| Args ---- ||\n|--------------------------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------|\n| `ds_info` | DatasetInfo for this dataset. |\n| `max_examples_per_shard` | maximum number of examples to write per shard. |\n| `overwrite` | if True, it ignores and overwrites any existing data. Otherwise, it loads the existing dataset and appends the new data (new data will always be created as new shards). |\n| `file_format` | An entry in file_adapters.FileFormat. |\n\n\u003cbr /\u003e\n\nMethods\n-------\n\n### `add_examples`\n\n[View source](https://github.com/tensorflow/datasets/blob/v4.9.3/tensorflow_datasets/core/sequential_writer.py#L319-L342) \n\n add_examples(\n split_examples: Dict[str, List[Any]]\n ) -\u003e None\n\nAdds examples to the splits.\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n| Args ||\n|------------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|\n| `split_examples` | dictionary of `split_name`:list_of_examples that includes the list of examples that has to be added to each of the splits. Not all the existing splits have to be in the dictionary |\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n| Raises ||\n|------------|-------------------------------------|\n| `KeyError` | if any of the splits doesn't exist. |\n\n\u003cbr /\u003e\n\n### `close_all`\n\n[View source](https://github.com/tensorflow/datasets/blob/v4.9.3/tensorflow_datasets/core/sequential_writer.py#L365-L369) \n\n close_all() -\u003e None\n\nCloses all the open splits.\n\n### `close_splits`\n\n[View source](https://github.com/tensorflow/datasets/blob/v4.9.3/tensorflow_datasets/core/sequential_writer.py#L349-L363) \n\n close_splits(\n splits: List[str]\n ) -\u003e None\n\nCloses the given list of splits.\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n| Args ||\n|----------|----------------------|\n| `splits` | list of split names. |\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n| Raises ||\n|------------|-------------------------------------|\n| `KeyError` | if any of the splits doesn't exist. |\n\n\u003cbr /\u003e\n\n### `initialize_splits`\n\n[View source](https://github.com/tensorflow/datasets/blob/v4.9.3/tensorflow_datasets/core/sequential_writer.py#L292-L317) \n\n initialize_splits(\n splits: List[str], fail_if_exists: bool = True\n ) -\u003e None\n\nAdds new splits to the dataset.\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n| Args ||\n|------------------|------------------------------------------------|\n| `splits` | list of split names to add. |\n| `fail_if_exists` | will fail if this split already contains data. |\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n| Raises ||\n|------------|----------------------------------|\n| `KeyError` | if the split is already present. |\n\n\u003cbr /\u003e"]]