View source on GitHub |
Template to produce filenames for sharded datasets.
tfds.core.ShardedFileTemplate(
data_dir: epath.Path,
template: str = DEFAULT_FILENAME_TEMPLATE,
dataset_name: Optional[str] = None,
split: Optional[str] = None,
filetype_suffix: Optional[str] = None
)
Methods
filepath_prefix
filepath_prefix() -> str
is_valid
is_valid(
filename: str
) -> bool
Returns whether the given filename follows this template.
parse_filename_info
parse_filename_info(
filename: str
) -> Optional[FilenameInfo]
Parses the filename using this template.
Note that when the filename doesn't specify the dataset name, split, or filetype suffix, but this template does, then the value in the template will be used.
Arguments | |
---|---|
filename
|
the filename that should be parsed. |
Returns | |
---|---|
the FilenameInfo corresponding to the given file if it could be parsed. None otherwise. |
relative_filepath
relative_filepath(
*, shard_index: int, num_shards: Optional[int]
) -> str
Returns the path (relative to the data dir) of the shard.
replace
replace(
**kwargs
) -> 'ShardedFileTemplate'
Returns a copy of the ShardedFileTemplate
with updated attributes.
sharded_filenames
sharded_filenames(
num_shards: int
) -> List[str]
sharded_filepath
sharded_filepath(
*, shard_index: int, num_shards: Optional[int]
) -> epath.Path
Returns the filename (including full path if data_dir
is set) for the given shard.
sharded_filepaths
sharded_filepaths(
num_shards: int
) -> List[epath.Path]
sharded_filepaths_pattern
sharded_filepaths_pattern(
*, num_shards: Optional[int] = None
) -> str
Returns a pattern describing all the file paths captured by this template.
If num_shards
is given, then it returns
'/path/dataset_name-split.fileformat@num_shards.
If
num_shardsis not given, then it returns
'/path/dataset_name-split.fileformat*
.
Args | |
---|---|
num_shards
|
optional specification of the number of shards. |
Returns | |
---|---|
the pattern describing all shards captured by this template. |
__eq__
__eq__(
other
)
Class Variables | |
---|---|
dataset_name |
None
|
filetype_suffix |
None
|
split |
None
|
template |
'{DATASET}-{SPLIT}.{FILEFORMAT}-{SHARD_X_OF_Y}'
|