tfio.experimental.columnar.parse_avro
Stay organized with collections
Save and categorize content based on your preferences.
Parses avro
records into a dict
of tensors.
tfio.experimental.columnar.parse_avro(
serialized, reader_schema, features, avro_names=None, name=None
)
This op parses serialized avro records into a dictionary mapping keys to
Tensor
, and SparseTensor
objects. features
is a dict from keys to
VarLenFeature
, SparseFeature
, RaggedFeature
, and FixedLenFeature
objects. Each VarLenFeature
and SparseFeature
is mapped to a
SparseTensor
; each FixedLenFeature
is mapped to a Tensor
.
Each VarLenFeature
maps to a SparseTensor
of the specified type
representing a ragged matrix. Its indices are [batch, index]
where batch
identifies the example in serialized
, and index
is the value's index in
the list of values associated with that feature and example.
Each SparseFeature
maps to a SparseTensor
of the specified type
representing a Tensor of dense_shape
[batch_size] + SparseFeature.size
.
Its values
come from the feature in the examples with key value_key
.
A values[i]
comes from a position k
in the feature of an example at batch
entry batch
. This positional information is recorded in indices[i]
as
[batch, index_0, index_1, ...]
where index_j
is the k-th
value of
the feature in the example at with key SparseFeature.index_key[j]
.
In other words, we split the indices (except the first index indicating the
batch entry) of a SparseTensor
by dimension into different features of the
avro record. Due to its complexity a VarLenFeature
should be preferred
over a SparseFeature
whenever possible.
Each FixedLenFeature
df
maps to a Tensor
of the specified type (or
tf.float32
if not specified) and shape (serialized.size(),) + df.shape
.
FixedLenFeature
entries with a default_value
are optional. With no default
value, we will fail if that Feature
is missing from any example in
serialized
.
Use this within the dataset.map(parser_fn=parse_avro).
Only works for batched serialized input!
Args |
serialized
|
The batched, serialized string tensors.
|
reader_schema
|
The reader schema. Note, this MUST match the reader schema
from the avro_record_dataset. Otherwise, this op will segfault!
|
features
|
A map of feature names mapped to feature information.
|
avro_names
|
(Optional.) may contain descriptive names for the
corresponding serialized avro parts. These may be useful for debugging
purposes, but they have no effect on the output. If not None ,
avro_names must be the same length as serialized .
|
name
|
The name of the op.
|
Returns |
A map of feature names to tensors.
|
Except as otherwise noted, the content of this page is licensed under the Creative Commons Attribution 4.0 License, and code samples are licensed under the Apache 2.0 License. For details, see the Google Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.
Last updated 2022-02-15 UTC.
[[["Easy to understand","easyToUnderstand","thumb-up"],["Solved my problem","solvedMyProblem","thumb-up"],["Other","otherUp","thumb-up"]],[["Missing the information I need","missingTheInformationINeed","thumb-down"],["Too complicated / too many steps","tooComplicatedTooManySteps","thumb-down"],["Out of date","outOfDate","thumb-down"],["Samples / code issue","samplesCodeIssue","thumb-down"],["Other","otherDown","thumb-down"]],["Last updated 2022-02-15 UTC."],[],[],null,["# tfio.experimental.columnar.parse_avro\n\n\u003cbr /\u003e\n\n|-------------------------------------------------------------------------------------------------------------------------------------|\n| [View source on GitHub](https://github.com/tensorflow/io/blob/v0.24.0/tensorflow_io/python/experimental/parse_avro_ops.py#L30-L118) |\n\nParses `avro` records into a `dict` of tensors. \n\n tfio.experimental.columnar.parse_avro(\n serialized, reader_schema, features, avro_names=None, name=None\n )\n\nThis op parses serialized avro records into a dictionary mapping keys to\n`Tensor`, and `SparseTensor` objects. `features` is a dict from keys to\n`VarLenFeature`, `SparseFeature`, `RaggedFeature`, and `FixedLenFeature`\nobjects. Each `VarLenFeature` and `SparseFeature` is mapped to a\n`SparseTensor`; each `FixedLenFeature` is mapped to a `Tensor`.\n\nEach `VarLenFeature` maps to a `SparseTensor` of the specified type\nrepresenting a ragged matrix. Its indices are `[batch, index]` where `batch`\nidentifies the example in `serialized`, and `index` is the value's index in\nthe list of values associated with that feature and example.\n\nEach `SparseFeature` maps to a `SparseTensor` of the specified type\nrepresenting a Tensor of `dense_shape` `[batch_size] + SparseFeature.size`.\nIts `values` come from the feature in the examples with key `value_key`.\nA `values[i]` comes from a position `k` in the feature of an example at batch\nentry `batch`. This positional information is recorded in `indices[i]` as\n`[batch, index_0, index_1, ...]` where `index_j` is the `k-th` value of\nthe feature in the example at with key [`SparseFeature.index_key[j]`](https://www.tensorflow.org/api_docs/python/tf/io/SparseFeature#index_key).\nIn other words, we split the indices (except the first index indicating the\nbatch entry) of a `SparseTensor` by dimension into different features of the\navro record. Due to its complexity a `VarLenFeature` should be preferred\nover a `SparseFeature` whenever possible.\n\nEach `FixedLenFeature` `df` maps to a `Tensor` of the specified type (or\n[`tf.float32`](https://www.tensorflow.org/api_docs/python/tf#float32) if not specified) and shape `(serialized.size(),) + df.shape`.\n`FixedLenFeature` entries with a `default_value` are optional. With no default\nvalue, we will fail if that `Feature` is missing from any example in\n`serialized`.\n\nUse this within the dataset.map(parser_fn=parse_avro).\n\nOnly works for batched serialized input!\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n| Args ---- ||\n|-----------------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|\n| `serialized` | The batched, serialized string tensors. |\n| `reader_schema` | The reader schema. Note, this MUST match the reader schema from the avro_record_dataset. Otherwise, this op will segfault! |\n| `features` | A map of feature names mapped to feature information. |\n| `avro_names` | (Optional.) may contain descriptive names for the corresponding serialized avro parts. These may be useful for debugging purposes, but they have no effect on the output. If not `None`, `avro_names` must be the same length as `serialized`. |\n| `name` | The name of the op. |\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n| Returns ------- ||\n|---|---|\n| A map of feature names to tensors. ||\n\n\u003cbr /\u003e"]]