Returns a Dataset
of features from SequenceExample
.
tfr.data.read_batched_sequence_example_dataset(
file_pattern,
batch_size,
list_size,
context_feature_spec,
example_feature_spec,
reader=tfr.keras.pipeline.DatasetHparams.dataset_reader
,
reader_args=None,
num_epochs=None,
shuffle=True,
shuffle_buffer_size=1000,
shuffle_seed=None,
prefetch_buffer_size=32,
reader_num_threads=10,
sloppy_ordering=True,
drop_final_batch=False
)
Example:
data = [
sequence_example {
context {
feature {
key: "query_length"
value { int64_list { value: 3 } }
}
}
feature_lists {
feature_list {
key: "unigrams"
value {
feature { bytes_list { value: "tensorflow" } }
feature { bytes_list { value: ["learning" "to" "rank"] } }
}
}
feature_list {
key: "utility"
value {
feature { float_list { value: 0.0 } }
feature { float_list { value: 1.0 } }
}
}
}
}
sequence_example {
context {
feature {
key: "query_length"
value { int64_list { value: 2 } }
}
}
feature_lists {
feature_list {
key: "unigrams"
value {
feature { bytes_list { value: "gbdt" } }
feature { }
}
}
feature_list {
key: "utility"
value {
feature { float_list { value: 0.0 } }
feature { float_list { value: 0.0 } }
}
}
}
}
]
We can use arguments:
context_features: {
"query_length": parsing_ops.FixedLenFeature([1], dtypes.int64)
}
example_features: {
"unigrams": parsing_ops.VarLenFeature(dtypes.string),
"utility": parsing_ops.FixedLenFeature([1], dtypes.float32,
default_value=[0.])
}
batch_size: 2
And the expected output is:
{
"unigrams": SparseTensor(
indices=array([[0, 0, 0], [0, 1, 0], [0, 1, 1], [0, 1, 2], [1, 0, 0], [1,
1, 0], [1, 1, 1]]),
values=["tensorflow", "learning", "to", "rank", "gbdt"],
dense_shape=array([2, 2, 3])),
"utility": [[[ 0.], [ 1.]], [[ 0.], [ 0.]]],
"query_length": [[3], [2]],
}
Args |
file_pattern
|
(str | list(str)) List of files or patterns of file paths
containing tf.SequenceExample protos. See tf.gfile.Glob for pattern
rules.
|
batch_size
|
(int) Number of records to combine in a single batch.
|
list_size
|
(int) The number of frames to keep in a SequenceExample. If
specified, truncation or padding may happen. Otherwise, set it to None to
allow dynamic list size.
|
context_feature_spec
|
(dict) A mapping from feature keys to
FixedLenFeature or VarLenFeature values.
|
example_feature_spec
|
(dict) A mapping feature keys to FixedLenFeature or
VarLenFeature values.
|
reader
|
A function or class that can be called with a filenames tensor and
(optional) reader_args and returns a Dataset . Defaults to
tf.data.TFRecordDataset .
|
reader_args
|
(list) Additional argument list to pass to the reader class.
|
num_epochs
|
(int) Number of times to read through the dataset. If None,
cycles through the dataset forever. Defaults to None .
|
shuffle
|
(bool) Indicates whether the input should be shuffled. Defaults to
True .
|
shuffle_buffer_size
|
(int) Buffer size of the ShuffleDataset. A large
capacity ensures better shuffling but would increase memory usage and
startup time.
|
shuffle_seed
|
(int) Randomization seed to use for shuffling.
|
prefetch_buffer_size
|
(int) Number of feature batches to prefetch in order
to improve performance. Recommended value is the number of batches
consumed per training step (default is 1).
|
reader_num_threads
|
(int) Number of threads used to read records. If greater
than 1, the results will be interleaved.
|
sloppy_ordering
|
(bool) If True , reading performance will be improved at
the cost of non-deterministic ordering. If False , the order of elements
produced is deterministic prior to shuffling (elements are still
randomized if shuffle=True . Note that if the seed is set, then order of
elements after shuffling is deterministic). Defaults to False .
|
drop_final_batch
|
(bool) If True , and the batch size does not evenly
divide the input dataset size, the final smaller batch will be dropped.
Defaults to False . If True , the batch_size can be statically inferred.
|
Returns |
A dataset of dict elements. Each dict maps feature keys to
Tensor or SparseTensor objects. The context features are mapped to a
rank-2 tensor of shape [batch_size, feature_size], and the example features
are mapped to a rank-3 tensor of shape [batch_size, list_size,
feature_size], where list_size is the number of examples.
|