tfio.experimental.columnar.make_avro_record_dataset
Stay organized with collections
Save and categorize content based on your preferences.
Reads and (optionally) parses avro files into a dataset.
tfio.experimental.columnar.make_avro_record_dataset(
file_pattern, features, batch_size, reader_schema, reader_buffer_size=None,
num_epochs=None, shuffle=True, shuffle_buffer_size=None, shuffle_seed=None,
prefetch_buffer_size=tf.data.experimental.AUTOTUNE, num_parallel_reads=None,
drop_final_batch=False
)
Used in the notebooks
Provides common functionality such as batching, optional parsing, shuffling,
and performing defaults.
Args:
file_pattern: List of files or patterns of avro file paths.
See tf.io.gfile.glob
for pattern rules.
features: A map of feature names mapped to feature information.
batch_size: An int representing the number of records to combine
in a single batch.
reader_schema: The reader schema.
reader_buffer_size: (Optional.) An int specifying the readers buffer
size in By. If None (the default) will use the default value from
AvroRecordDataset.
num_epochs: (Optional.) An int specifying the number of times this
dataset is repeated. If None (the default), cycles through the
dataset forever. If set to None drops final batch.
shuffle: (Optional.) A bool that indicates whether the input
should be shuffled. Defaults to True
.
shuffle_buffer_size: (Optional.) Buffer size to use for
shuffling. A large buffer size ensures better shuffling, but
increases memory usage and startup time. If not provided
assumes default value of 10,000 records. Note that the shuffle
size is measured in records.
shuffle_seed: (Optional.) Randomization seed to use for shuffling.
By default uses a pseudo-random seed.
prefetch_buffer_size: (Optional.) An int specifying the number of
feature batches to prefetch for performance improvement.
Defaults to auto-tune. Set to 0 to disable prefetching.
num_parallel_reads: (Optional.) Number of parallel
records to parse in parallel. Defaults to None(no parallelization).
drop_final_batch: (Optional.) Whether the last batch should be
dropped in case its size is smaller than batch_size
; the
default behavior is not to drop the smaller batch.
Returns:
A dataset, where each element matches the output of parser_fn
except it will have an additional leading batch-size
dimension,
or a batch_size
-length 1-D tensor of strings if parser_fn
is
unspecified.
Except as otherwise noted, the content of this page is licensed under the Creative Commons Attribution 4.0 License, and code samples are licensed under the Apache 2.0 License. For details, see the Google Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.
Last updated 2022-02-15 UTC.
[[["Easy to understand","easyToUnderstand","thumb-up"],["Solved my problem","solvedMyProblem","thumb-up"],["Other","otherUp","thumb-up"]],[["Missing the information I need","missingTheInformationINeed","thumb-down"],["Too complicated / too many steps","tooComplicatedTooManySteps","thumb-down"],["Out of date","outOfDate","thumb-down"],["Samples / code issue","samplesCodeIssue","thumb-down"],["Other","otherDown","thumb-down"]],["Last updated 2022-02-15 UTC."],[],[],null,["# tfio.experimental.columnar.make_avro_record_dataset\n\n\u003cbr /\u003e\n\n|-----------------------------------------------------------------------------------------------------------------------------------------------|\n| [View source on GitHub](https://github.com/tensorflow/io/blob/v0.24.0/tensorflow_io/python/experimental/make_avro_record_dataset.py#L26-L117) |\n\nReads and (optionally) parses avro files into a dataset. \n\n tfio.experimental.columnar.make_avro_record_dataset(\n file_pattern, features, batch_size, reader_schema, reader_buffer_size=None,\n num_epochs=None, shuffle=True, shuffle_buffer_size=None, shuffle_seed=None,\n prefetch_buffer_size=tf.data.experimental.AUTOTUNE, num_parallel_reads=None,\n drop_final_batch=False\n )\n\n### Used in the notebooks\n\n| Used in the tutorials |\n|--------------------------------------------------------------------|\n| - [Avro Dataset API](https://www.tensorflow.org/io/tutorials/avro) |\n\nProvides common functionality such as batching, optional parsing, shuffling,\nand performing defaults.\nArgs:\nfile_pattern: List of files or patterns of avro file paths.\nSee [`tf.io.gfile.glob`](https://www.tensorflow.org/api_docs/python/tf/io/gfile/glob) for pattern rules.\nfeatures: A map of feature names mapped to feature information.\nbatch_size: An int representing the number of records to combine\nin a single batch.\nreader_schema: The reader schema.\nreader_buffer_size: (Optional.) An int specifying the readers buffer\nsize in By. If None (the default) will use the default value from\nAvroRecordDataset.\nnum_epochs: (Optional.) An int specifying the number of times this\ndataset is repeated. If None (the default), cycles through the\ndataset forever. If set to None drops final batch.\nshuffle: (Optional.) A bool that indicates whether the input\nshould be shuffled. Defaults to `True`.\nshuffle_buffer_size: (Optional.) Buffer size to use for\nshuffling. A large buffer size ensures better shuffling, but\nincreases memory usage and startup time. If not provided\nassumes default value of 10,000 records. Note that the shuffle\nsize is measured in records.\nshuffle_seed: (Optional.) Randomization seed to use for shuffling.\nBy default uses a pseudo-random seed.\nprefetch_buffer_size: (Optional.) An int specifying the number of\nfeature batches to prefetch for performance improvement.\nDefaults to auto-tune. Set to 0 to disable prefetching.\nnum_parallel_reads: (Optional.) Number of parallel\nrecords to parse in parallel. Defaults to None(no parallelization).\ndrop_final_batch: (Optional.) Whether the last batch should be\ndropped in case its size is smaller than `batch_size`; the\ndefault behavior is not to drop the smaller batch.\nReturns:\nA dataset, where each element matches the output of `parser_fn`\nexcept it will have an additional leading `batch-size` dimension,\nor a `batch_size`-length 1-D tensor of strings if `parser_fn` is\nunspecified."]]