tf.io.decode_proto
Stay organized with collections
Save and categorize content based on your preferences.
The op extracts fields from a serialized protocol buffers message into tensors.
tf.io.decode_proto(
bytes, message_type, field_names, output_types, descriptor_source='local://',
message_format='binary', sanitize=False, name=None
)
The decode_proto
op extracts fields from a serialized protocol buffers
message into tensors. The fields in field_names
are decoded and converted
to the corresponding output_types
if possible.
A message_type
name must be provided to give context for the field names.
The actual message descriptor can be looked up either in the linked-in
descriptor pool or a filename provided by the caller using the
descriptor_source
attribute.
Each output tensor is a dense tensor. This means that it is padded to hold
the largest number of repeated elements seen in the input minibatch. (The
shape is also padded by one to prevent zero-sized dimensions). The actual
repeat counts for each example in the minibatch can be found in the sizes
output. In many cases the output of decode_proto
is fed immediately into
tf.squeeze if missing values are not a concern. When using tf.squeeze, always
pass the squeeze dimension explicitly to avoid surprises.
For the most part, the mapping between Proto field types and TensorFlow dtypes
is straightforward. However, there are a few special cases:
A proto field that contains a submessage or group can only be converted
to DT_STRING
(the serialized submessage). This is to reduce the complexity
of the API. The resulting string can be used as input to another instance of
the decode_proto op.
TensorFlow lacks support for unsigned integers. The ops represent uint64
types as a DT_INT64
with the same twos-complement bit pattern (the obvious
way). Unsigned int32 values can be represented exactly by specifying type
DT_INT64
, or using twos-complement if the caller specifies DT_INT32
in
the output_types
attribute.
Both binary and text proto serializations are supported, and can be
chosen using the format
attribute.
The descriptor_source
attribute selects the source of protocol
descriptors to consult when looking up message_type
. This may be:
An empty string or "local://", in which case protocol descriptors are
created for C++ (not Python) proto definitions linked to the binary.
A file, in which case protocol descriptors are created from the file,
which is expected to contain a FileDescriptorSet
serialized as a string.
NOTE: You can build a descriptor_source
file using the --descriptor_set_out
and --include_imports
options to the protocol compiler protoc
.
A "bytes://", in which protocol descriptors are created from <bytes>
,
which is expected to be a FileDescriptorSet
serialized as a string.
Args |
bytes
|
A Tensor of type string .
Tensor of serialized protos with shape batch_shape .
|
message_type
|
A string . Name of the proto message type to decode.
|
field_names
|
A list of strings .
List of strings containing proto field names. An extension field can be decoded
by using its full name, e.g. EXT_PACKAGE.EXT_FIELD_NAME.
|
output_types
|
A list of tf.DTypes .
List of TF types to use for the respective field in field_names.
|
descriptor_source
|
An optional string . Defaults to "local://" .
Either the special value local:// or a path to a file containing
a serialized FileDescriptorSet .
|
message_format
|
An optional string . Defaults to "binary" .
Either binary or text .
|
sanitize
|
An optional bool . Defaults to False .
Whether to sanitize the result or not.
|
name
|
A name for the operation (optional).
|
Returns |
A tuple of Tensor objects (sizes, values).
|
sizes
|
A Tensor of type int32 .
|
values
|
A list of Tensor objects of type output_types .
|
Except as otherwise noted, the content of this page is licensed under the Creative Commons Attribution 4.0 License, and code samples are licensed under the Apache 2.0 License. For details, see the Google Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.
Last updated 2020-10-01 UTC.
[[["Easy to understand","easyToUnderstand","thumb-up"],["Solved my problem","solvedMyProblem","thumb-up"],["Other","otherUp","thumb-up"]],[["Missing the information I need","missingTheInformationINeed","thumb-down"],["Too complicated / too many steps","tooComplicatedTooManySteps","thumb-down"],["Out of date","outOfDate","thumb-down"],["Samples / code issue","samplesCodeIssue","thumb-down"],["Other","otherDown","thumb-down"]],["Last updated 2020-10-01 UTC."],[],[],null,["# tf.io.decode_proto\n\n\u003cbr /\u003e\n\n|----------------------------------------------------------------------------|\n| [TensorFlow 1 version](/versions/r1.15/api_docs/python/tf/io/decode_proto) |\n\nThe op extracts fields from a serialized protocol buffers message into tensors.\n\n#### View aliases\n\n\n**Compat aliases for migration**\n\nSee\n[Migration guide](https://www.tensorflow.org/guide/migrate) for\nmore details.\n\n[`tf.compat.v1.io.decode_proto`](/api_docs/python/tf/io/decode_proto)\n\n\u003cbr /\u003e\n\n tf.io.decode_proto(\n bytes, message_type, field_names, output_types, descriptor_source='local://',\n message_format='binary', sanitize=False, name=None\n )\n\nThe `decode_proto` op extracts fields from a serialized protocol buffers\nmessage into tensors. The fields in `field_names` are decoded and converted\nto the corresponding `output_types` if possible.\n\nA `message_type` name must be provided to give context for the field names.\nThe actual message descriptor can be looked up either in the linked-in\ndescriptor pool or a filename provided by the caller using the\n`descriptor_source` attribute.\n\nEach output tensor is a dense tensor. This means that it is padded to hold\nthe largest number of repeated elements seen in the input minibatch. (The\nshape is also padded by one to prevent zero-sized dimensions). The actual\nrepeat counts for each example in the minibatch can be found in the `sizes`\noutput. In many cases the output of `decode_proto` is fed immediately into\ntf.squeeze if missing values are not a concern. When using tf.squeeze, always\npass the squeeze dimension explicitly to avoid surprises.\n\nFor the most part, the mapping between Proto field types and TensorFlow dtypes\nis straightforward. However, there are a few special cases:\n\n- A proto field that contains a submessage or group can only be converted\n to `DT_STRING` (the serialized submessage). This is to reduce the complexity\n of the API. The resulting string can be used as input to another instance of\n the decode_proto op.\n\n- TensorFlow lacks support for unsigned integers. The ops represent uint64\n types as a `DT_INT64` with the same twos-complement bit pattern (the obvious\n way). Unsigned int32 values can be represented exactly by specifying type\n `DT_INT64`, or using twos-complement if the caller specifies `DT_INT32` in\n the `output_types` attribute.\n\nBoth binary and text proto serializations are supported, and can be\nchosen using the `format` attribute.\n\nThe `descriptor_source` attribute selects the source of protocol\ndescriptors to consult when looking up `message_type`. This may be:\n\n- An empty string or \"local://\", in which case protocol descriptors are\n created for C++ (not Python) proto definitions linked to the binary.\n\n- A file, in which case protocol descriptors are created from the file,\n which is expected to contain a `FileDescriptorSet` serialized as a string.\n NOTE: You can build a `descriptor_source` file using the `--descriptor_set_out`\n and `--include_imports` options to the protocol compiler `protoc`.\n\n- A \"bytes://\", in which protocol descriptors are created from `\u003cbytes\u003e`, which is expected to be a `FileDescriptorSet` serialized as a string.\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n| Args ---- ||\n|---------------------|---------------------------------------------------------------------------------------------------------------------------------------------------------------|\n| `bytes` | A `Tensor` of type `string`. Tensor of serialized protos with shape `batch_shape`. |\n| `message_type` | A `string`. Name of the proto message type to decode. |\n| `field_names` | A list of `strings`. List of strings containing proto field names. An extension field can be decoded by using its full name, e.g. EXT_PACKAGE.EXT_FIELD_NAME. |\n| `output_types` | A list of `tf.DTypes`. List of TF types to use for the respective field in field_names. |\n| `descriptor_source` | An optional `string`. Defaults to `\"local://\"`. Either the special value `local://` or a path to a file containing a serialized `FileDescriptorSet`. |\n| `message_format` | An optional `string`. Defaults to `\"binary\"`. Either `binary` or `text`. |\n| `sanitize` | An optional `bool`. Defaults to `False`. Whether to sanitize the result or not. |\n| `name` | A name for the operation (optional). |\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n| Returns ------- ||\n|----------|----------------------------------------------------|\n| A tuple of `Tensor` objects (sizes, values). ||\n| `sizes` | A `Tensor` of type `int32`. |\n| `values` | A list of `Tensor` objects of type `output_types`. |\n\n\u003cbr /\u003e"]]