The op extracts fields from a serialized protocol buffers message into tensors.
Note: This API is designed for orthogonality rather than human-friendliness. It can be used to parse input protos by hand, but it is intended for use in generated code.
The `decode_proto` op extracts fields from a serialized protocol buffers message into tensors. The fields in `field_names` are decoded and converted to the corresponding `output_types` if possible.
A `message_type` name must be provided to give context for the field names. The actual message descriptor can be looked up either in the linked-in descriptor pool or a filename provided by the caller using the `descriptor_source` attribute.
Each output tensor is a dense tensor. This means that it is padded to hold the largest number of repeated elements seen in the input minibatch. (The shape is also padded by one to prevent zero-sized dimensions). The actual repeat counts for each example in the minibatch can be found in the `sizes` output. In many cases the output of `decode_proto` is fed immediately into tf.squeeze if missing values are not a concern. When using tf.squeeze, always pass the squeeze dimension explicitly to avoid surprises.
For the most part, the mapping between Proto field types and TensorFlow dtypes is straightforward. However, there are a few special cases:
- A proto field that contains a submessage or group can only be converted to `DT_STRING` (the serialized submessage). This is to reduce the complexity of the API. The resulting string can be used as input to another instance of the decode_proto op.
- TensorFlow lacks support for unsigned integers. The ops represent uint64 types as a `DT_INT64` with the same twos-complement bit pattern (the obvious way). Unsigned int32 values can be represented exactly by specifying type `DT_INT64`, or using twos-complement if the caller specifies `DT_INT32` in the `output_types` attribute.
- `map` fields are not directly decoded. They are treated as `repeated` fields,
of the appropriate entry type. The proto-compiler defines entry types for each
map field. The type-name is the field name, converted to "CamelCase" with
"Entry" appended. The tf.train.Features.FeatureEntry
message is an example of
one of these implicit `Entry` types.
- `enum` fields should be read as int32.
Both binary and text proto serializations are supported, and can be chosen using the `format` attribute.
The `descriptor_source` attribute selects the source of protocol descriptors to consult when looking up `message_type`. This may be:
- An empty string or "local://", in which case protocol descriptors are created for C++ (not Python) proto definitions linked to the binary.
- A file, in which case protocol descriptors are created from the file, which is expected to contain a `FileDescriptorSet` serialized as a string. NOTE: You can build a `descriptor_source` file using the `--descriptor_set_out` and `--include_imports` options to the protocol compiler `protoc`.
- A "bytes://
Nested Classes
class | DecodeProto.Options | Optional attributes for DecodeProto
|
Public Methods
static DecodeProto |
create(Scope scope, Operand<String> bytes, String messageType, List<String> fieldNames, List<Class<?>> outputTypes, Options... options)
Factory method to create a class wrapping a new DecodeProto operation.
|
static DecodeProto.Options |
descriptorSource(String descriptorSource)
|
static DecodeProto.Options |
messageFormat(String messageFormat)
|
static DecodeProto.Options |
sanitize(Boolean sanitize)
|
Output<Integer> |
sizes()
Tensor of int32 with shape `[batch_shape, len(field_names)]`.
|
List<Output<?>> |
values()
List of tensors containing values for the corresponding field.
|
Inherited Methods
Public Methods
public static DecodeProto create (Scope scope, Operand<String> bytes, String messageType, List<String> fieldNames, List<Class<?>> outputTypes, Options... options)
Factory method to create a class wrapping a new DecodeProto operation.
Parameters
scope | current scope |
---|---|
bytes | Tensor of serialized protos with shape `batch_shape`. |
messageType | Name of the proto message type to decode. |
fieldNames | List of strings containing proto field names. An extension field can be decoded by using its full name, e.g. EXT_PACKAGE.EXT_FIELD_NAME. |
outputTypes | List of TF types to use for the respective field in field_names. |
options | carries optional attributes values |
Returns
- a new instance of DecodeProto
public static DecodeProto.Options descriptorSource (String descriptorSource)
Parameters
descriptorSource | Either the special value `local://` or a path to a file containing a serialized `FileDescriptorSet`. |
---|
public static DecodeProto.Options messageFormat (String messageFormat)
Parameters
messageFormat | Either `binary` or text .
|
---|
public static DecodeProto.Options sanitize (Boolean sanitize)
Parameters
sanitize | Whether to sanitize the result or not. |
---|
public Output<Integer> sizes ()
Tensor of int32 with shape `[batch_shape, len(field_names)]`. Each entry is the number of values found for the corresponding field. Optional fields may have 0 or 1 values.
public List<Output<?>> values ()
List of tensors containing values for the corresponding field. `values[i]` has datatype `output_types[i]` and shape `[batch_shape, max(sizes[...,i])]`.