View source on GitHub |
Class to write a TFDS dataset sequentially.
tfds.core.SequentialWriter(
ds_info: dataset_info.DatasetInfo,
max_examples_per_shard: int,
overwrite: bool = True,
file_format: str = 'tfrecord'
)
The SequentialWriter can be used to generate TFDS datasets by directly appending TF Examples to the desired splits.
Once the user creates a SequentialWriter with a given DatasetInfo, they can create splits, append examples to them, and close them whenever they are finished.
Note that:
- Not closing a split may cause data to be lost.
- The examples are written to disk in the same order that they are given to the writer.
- Since the SequentialWriter doesn't know how many examples are going to be
written, it can't estimate the optimal number of shards per split. Use the
max_examples_per_shard
parameter in the constructor to control how many elements there should be per shard.
The datasets written with this writer can be read directly with
tfds.builder_from_directories
.
Example:
writer = SequentialWriter(ds_info=ds_info, max_examples_per_shard=1000) writer.initialize_splits(['train', 'test'])
while (...): # Code that generates the examples writer.add_examples({'train': [example1, example2], 'test': [example3]}) ...
writer.close_splits()
Methods
add_examples
add_examples(
split_examples: Dict[str, List[Any]]
) -> None
Adds examples to the splits.
Args | |
---|---|
split_examples
|
dictionary of split_name :list_of_examples that includes
the list of examples that has to be added to each of the splits. Not all
the existing splits have to be in the dictionary
|
Raises | |
---|---|
KeyError
|
if any of the splits doesn't exist. |
close_all
close_all() -> None
Closes all the open splits.
close_splits
close_splits(
splits: List[str]
) -> None
Closes the given list of splits.
Args | |
---|---|
splits
|
list of split names. |
Raises | |
---|---|
KeyError
|
if any of the splits doesn't exist. |
initialize_splits
initialize_splits(
splits: List[str], fail_if_exists: bool = True
) -> None
Adds new splits to the dataset.
Args | |
---|---|
splits
|
list of split names to add. |
fail_if_exists
|
will fail if this split already contains data. |
Raises | |
---|---|
KeyError
|
if the split is already present. |