tf.feature_column.categorical_column_with_hash_bucket
Stay organized with collections
Save and categorize content based on your preferences.
Represents sparse feature where ids are set by hashing.
tf.feature_column.categorical_column_with_hash_bucket(
key, hash_bucket_size, dtype=tf.dtypes.string
)
Use this when your sparse features are in string or integer format, and you
want to distribute your inputs into a finite number of buckets by hashing.
output_id = Hash(input_feature_string) % bucket_size for string type input.
For int type input, the value is converted to its string representation first
and then hashed by the same formula.
For input dictionary features
, features[key]
is either Tensor
or
SparseTensor
. If Tensor
, missing values can be represented by -1
for int
and ''
for string, which will be dropped by this feature column.
Example:
keywords = categorical_column_with_hash_bucket("keywords", 10K)
columns = [keywords, ...]
features = tf.io.parse_example(..., features=make_parse_example_spec(columns))
linear_prediction = linear_model(features, columns)
# or
keywords_embedded = embedding_column(keywords, 16)
columns = [keywords_embedded, ...]
features = tf.io.parse_example(..., features=make_parse_example_spec(columns))
dense_tensor = input_layer(features, columns)
Args |
key
|
A unique string identifying the input feature. It is used as the
column name and the dictionary key for feature parsing configs, feature
Tensor objects, and feature columns.
|
hash_bucket_size
|
An int > 1. The number of buckets.
|
dtype
|
The type of features. Only string and integer types are supported.
|
Returns |
A HashedCategoricalColumn .
|
Raises |
ValueError
|
hash_bucket_size is not greater than 1.
|
ValueError
|
dtype is neither string nor integer.
|
Except as otherwise noted, the content of this page is licensed under the Creative Commons Attribution 4.0 License, and code samples are licensed under the Apache 2.0 License. For details, see the Google Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.
Last updated 2020-10-01 UTC.
[[["Easy to understand","easyToUnderstand","thumb-up"],["Solved my problem","solvedMyProblem","thumb-up"],["Other","otherUp","thumb-up"]],[["Missing the information I need","missingTheInformationINeed","thumb-down"],["Too complicated / too many steps","tooComplicatedTooManySteps","thumb-down"],["Out of date","outOfDate","thumb-down"],["Samples / code issue","samplesCodeIssue","thumb-down"],["Other","otherDown","thumb-down"]],["Last updated 2020-10-01 UTC."],[],[]]