Maps x
to a vocabulary specified by the deferred tensor.
tft.apply_vocabulary(
x: common_types.ConsistentTensorType,
deferred_vocab_filename_tensor: common_types.TemporaryAnalyzerOutputType,
*,
default_value: Any = -1,
num_oov_buckets: int = 0,
lookup_fn: Optional[Callable[[common_types.TensorType, tf.Tensor], Tuple[tf.Tensor, tf
.Tensor]]] = None,
file_format: common_types.VocabularyFileFormatType = analyzers.DEFAULT_VOCABULARY_FILE_FORMAT,
name: Optional[str] = None
) -> common_types.ConsistentTensorType
This function also writes domain statistics about the vocabulary min and max
values. Note that the min and max are inclusive, and depend on the vocab size,
num_oov_buckets and default_value.
Args |
x
|
A categorical Tensor , SparseTensor , or RaggedTensor of type
tf.string or tf.int[8|16|32|64] to which the vocabulary transformation
should be applied. The column names are those intended for the transformed
tensors.
|
deferred_vocab_filename_tensor
|
The deferred vocab filename tensor as
returned by tft.vocabulary , as long as the frequencies were not stored.
|
default_value
|
The value to use for out-of-vocabulary values, unless
'num_oov_buckets' is greater than zero.
|
num_oov_buckets
|
Any lookup of an out-of-vocabulary token will return a
bucket ID based on its hash if num_oov_buckets is greater than zero.
Otherwise it is assigned the default_value .
|
lookup_fn
|
Optional lookup function, if specified it should take a tensor
and a deferred vocab filename as an input and return a lookup op along
with the table size, by default apply_vocabulary constructs a
StaticHashTable for the table lookup.
|
file_format
|
(Optional) A str. The format of the given vocabulary. Accepted
formats are: 'tfrecord_gzip', 'text'. The default value is 'text'.
|
name
|
(Optional) A name for this operation.
|
Returns |
A Tensor , SparseTensor , or RaggedTensor where each string value is
mapped to an integer. Each unique string value that appears in the
vocabulary is mapped to a different integer and integers are consecutive
starting from zero, and string value not in the vocabulary is
assigned default_value.
|