View source on GitHub |
A Dataset feature encodes a nested dataset.
Inherits From: Sequence
, FeatureConnector
tfds.features.Dataset(
feature: feature_lib.FeatureConnectorArg,
length: Optional[int] = None,
*,
doc: feature_lib.DocArg = None
)
Dataset
corresponds to a dataset of tfds.features.FeatureConnector
. Using
tfds.features.Dataset
will return a nested tf.data.Dataset
inside the
top-level tf.data.Dataset
returned by tfds.load
. At generation time, an
iterable over the dataset elements is given.
This is an experimental feature. Currently, only one level of nesting is supported and TF1 graph is not supported either.
Example:
At construction time (inside _info
):
features=tfds.features.FeatureDict({
'agent_id': np.object_,
'episode': tfds.features.Dataset({
'observation': tfds.features.Image(),
'reward': tfds.features.Image(),
}),
})
Will return:
{
'agent_id': tf.Tensor(shape=(), dtype=tf.string),
'episode': tf.data.Dataset(element_spec={
'observation': tf.Tensor(shape=(None, None, 3), dtype=tf.uint8),
'reward': tf.Tensor(shape=(), dtype=tf.int32),
}),
}
The nested dataset can be used as:
for e in tfds.load(...): # {'agent_id': tf.Tensor, 'episode': tf.data.Dataset}
for step in e['episode']: # Each episode is a nested `tf.data.Dataset`
step['observation']
During generation, it accept any Iterable
/Iterator
, like
yield _, {
'agent_id': agent_name
'episode': ({'observation': ..., 'reward': ...} for _ in range(10)),
}
Or a dictionary of Iterable
, like
yield _, {
'agent_id': agent_name
'episode': {'observation': np.ones(10), 'reward': np.ones(10)} ,
}
Args | |
---|---|
feature
|
The features to wrap (any feature supported) |
length
|
int , length of the sequence if static and known in advance
|
doc
|
Documentation of this feature (e.g. description). |
Methods
catalog_documentation
catalog_documentation() -> List[feature_lib.CatalogFeatureDocumentation]
Returns the feature documentation to be shown in the catalog.
cls_from_name
@classmethod
cls_from_name( python_class_name: str ) -> Type['FeatureConnector']
Returns the feature class for the given Python class.
decode_batch_example
decode_batch_example(
tfexample_data
)
Decode multiple features batched in a single tf.Tensor.
This function is used to decode features wrapped in
tfds.features.Sequence()
.
By default, this function apply decode_example
on each individual
elements using tf.map_fn
. However, for optimization, features can
overwrite this method to apply a custom batch decoding.
Args | |
---|---|
tfexample_data
|
Same tf.Tensor inputs as decode_example , but with and
additional first dimension for the sequence length.
|
Returns | |
---|---|
tensor_data
|
Tensor or dictionary of tensor, output of the tf.data.Dataset object |
decode_example
decode_example(
serialized_example, decoders=None
)
Decode the feature dict to TF compatible input.
Args | |
---|---|
tfexample_data
|
Data or dictionary of data, as read by the tf-example
reader. It correspond to the tf.Tensor() (or dict of tf.Tensor() )
extracted from the tf.train.Example , matching the info defined in
get_serialized_info() .
|
Returns | |
---|---|
tensor_data
|
Tensor or dictionary of tensor, output of the tf.data.Dataset object |
decode_example_np
decode_example_np(
serialized_example, *, decoders=None
)
Encode the feature dict into NumPy-compatible input.
Args | |
---|---|
example_data
|
Value to convert to NumPy. |
Returns | |
---|---|
np_data
|
Data as NumPy-compatible type: either a Python primitive (bytes, int, etc) or a NumPy array. |
decode_ragged_example
decode_ragged_example(
tfexample_data
)
Decode nested features from a tf.RaggedTensor.
This function is used to decode features wrapped in nested
tfds.features.Sequence()
.
By default, this function apply decode_batch_example
on the flat values
of the ragged tensor. For optimization, features can
overwrite this method to apply a custom batch decoding.
Args | |
---|---|
tfexample_data
|
tf.RaggedTensor inputs containing the nested encoded
examples.
|
Returns | |
---|---|
tensor_data
|
The decoded tf.RaggedTensor or dictionary of tensor,
output of the tf.data.Dataset object
|
deserialize_example
deserialize_example(
serialized_example: Union[tf.Tensor, bytes], *, decoders=None
) -> utils.TensorDict
Decodes the tf.train.Example
data into tf.Tensor
.
See serialize_example
to encode the data into proto.
Args | |
---|---|
serialized_example
|
The tensor-like object containing the serialized
tf.train.Example proto.
|
decoders
|
Eventual decoders to apply (see documentation) |
Returns | |
---|---|
The decoded features tensors. |
deserialize_example_np
deserialize_example_np(
serialized_example: Union[tf.Tensor, bytes], *, decoders=None
) -> utils.NpArrayOrScalarDict
encode_example
encode_example(
example_ds: Union[Iterator[type_utils.TreeDict[Any]], Dict[str, Any]]
)
Encode the feature dict into tf-example compatible input.
The input example_data can be anything that the user passed at data generation. For example:
For features:
features={
'image': tfds.features.Image(),
'custom_feature': tfds.features.CustomFeature(),
}
At data generation (in _generate_examples
), if the user yields:
yield {
'image': 'path/to/img.png',
'custom_feature': [123, 'str', lambda x: x+1]
}
Then | |
---|---|
|
Args | |
---|---|
example_data
|
Value or dictionary of values to convert into tf-example compatible data. |
Returns | |
---|---|
tfexample_data
|
Data or dictionary of data to write as tf-example. Data
can be a list or numpy array.
Note that numpy arrays are flattened so it's the feature connector
responsibility to reshape them in decode_example() .
Note that tf.train.Example only supports int64, float32 and string so
the data returned here should be integer, float or string. User type
can be restored in decode_example() .
|
from_config
@classmethod
from_config( root_dir: str ) -> FeatureConnector
Reconstructs the FeatureConnector from the config file.
Usage:
features = FeatureConnector.from_config('path/to/dir')
Args | |
---|---|
root_dir
|
Directory containing the features.json file. |
Returns | |
---|---|
The reconstructed feature instance. |
from_json
@classmethod
from_json( value: Json ) -> FeatureConnector
FeatureConnector factory.
This function should be called from the tfds.features.FeatureConnector
base class. Subclass should implement the from_json_content
.
Example:
feature = tfds.features.FeatureConnector.from_json(
{'type': 'Image', 'content': {'shape': [32, 32, 3], 'dtype': 'uint8'} }
)
assert isinstance(feature, tfds.features.Image)
Args | |
---|---|
value
|
dict(type=, content=) containing the feature to restore. Match
dict returned by to_json .
|
Returns | |
---|---|
The reconstructed FeatureConnector. |
from_json_content
@classmethod
from_json_content( value: Union[Json, feature_pb2.Sequence] ) -> 'Sequence'
FeatureConnector factory (to overwrite).
Subclasses should overwrite this method. This method is used when importing the feature connector from the config.
This function should not be called directly. FeatureConnector.from_json
should be called instead.
See existing FeatureConnectors for implementation examples.
Args | |
---|---|
value
|
FeatureConnector information represented as either Json or a
Feature proto. The content must match what is returned by
to_json_content .
|
doc
|
Documentation of this feature (e.g. description). |
Returns | |
---|---|
The reconstructed FeatureConnector. |
from_proto
@classmethod
from_proto( feature_proto: feature_pb2.Feature ) -> T
Instantiates a feature from its proto representation.
get_serialized_info
get_serialized_info()
get_tensor_info
get_tensor_info()
Shape of one element of the dataset.
get_tensor_spec
get_tensor_spec() -> tf.data.DatasetSpec
Returns the tf.TensorSpec of this feature (not the element spec!).
Note that the output of this method may not correspond to the element spec of the dataset. For example, currently this method does not support RaggedTensorSpec.
load_metadata
load_metadata(
*args, **kwargs
)
See base class for details.
repr_html
repr_html(
ex: np.ndarray
) -> str
Returns the HTML str representation of the object.
repr_html_batch
repr_html_batch(
ex: np.ndarray
) -> str
Returns the HTML str representation of the object (Sequence).
repr_html_ragged
repr_html_ragged(
ex: np.ndarray
) -> str
Returns the HTML str representation of the object (Nested sequence).
save_config
save_config(
root_dir: str
) -> None
Exports the FeatureConnector
to a file.
Args | |
---|---|
root_dir
|
path/to/dir containing the features.json
|
save_metadata
save_metadata(
*args, **kwargs
)
See base class for details.
serialize_example
serialize_example(
example_data
) -> bytes
Encodes nested data values into tf.train.Example
bytes.
See deserialize_example
to decode the proto into tf.Tensor
.
Args | |
---|---|
example_data
|
Example data to encode (numpy-like nested dict) |
Returns | |
---|---|
The serialized tf.train.Example .
|
to_json
to_json() -> Json
Exports the FeatureConnector to Json.
Each feature is serialized as a dict(type=..., content=...)
.
type
: The cannonical name of the feature (module.FeatureName
).content
: is specific to each feature connector and defined into_json_content
. Can contain nested sub-features (like fortfds.features.FeaturesDict
andtfds.features.Sequence
).
For example:
tfds.features.FeaturesDict({
'input': tfds.features.Image(),
'target': tfds.features.ClassLabel(num_classes=10),
})
Is serialized as:
{
"type": "tensorflow_datasets.core.features.features_dict.FeaturesDict",
"content": {
"input": {
"type": "tensorflow_datasets.core.features.image_feature.Image",
"content": {
"shape": [null, null, 3],
"dtype": "uint8",
"encoding_format": "png"
}
},
"target": {
"type":
"tensorflow_datasets.core.features.class_label_feature.ClassLabel",
"content": {
"num_classes": 10
}
}
}
}
Returns | |
---|---|
A dict(type=, content=) . Will be forwarded to from_json when
reconstructing the feature.
|
to_json_content
to_json_content() -> feature_pb2.Sequence
FeatureConnector factory (to overwrite).
This function should be overwritten by the subclass to allow re-importing the feature connector from the config. See existing FeatureConnector for example of implementation.
Returns | |
---|---|
The FeatureConnector metadata in either a dict, or a Feature proto. This
output is used in from_json_content when reconstructing the feature.
|
to_proto
to_proto() -> feature_pb2.Feature
Exports the FeatureConnector to the Feature proto.
For features that have a specific schema defined in a proto, this function needs to be overriden. If there's no specific proto schema, then the feature will be represented using JSON.
Returns | |
---|---|
The feature proto describing this feature. |
__contains__
__contains__(
key: str
) -> bool
__getitem__
__getitem__(
key
)
Convenience method to access the underlying features.
Class Variables | |
---|---|
ALIASES |
[]
|