tf.feature_column.crossed_column
Stay organized with collections
Save and categorize content based on your preferences.
Returns a column for performing crosses of categorical features. (deprecated)
tf.feature_column.crossed_column(
keys, hash_bucket_size, hash_key=None
)
Crossed features will be hashed according to hash_bucket_size
. Conceptually,
the transformation can be thought of as:
Hash(cartesian product of features) % hash_bucket_size
For example, if the input features are:
then crossed feature will look like:
shape = [2, 2]
{
[0, 0]: Hash64("d", Hash64("a")) % hash_bucket_size
[1, 0]: Hash64("e", Hash64("b")) % hash_bucket_size
[1, 1]: Hash64("e", Hash64("c")) % hash_bucket_size
}
Here is an example to create a linear model with crosses of string features:
keywords_x_doc_terms = crossed_column(['keywords', 'doc_terms'], 50K)
columns = [keywords_x_doc_terms, ...]
features = tf.io.parse_example(..., features=make_parse_example_spec(columns))
linear_prediction = linear_model(features, columns)
You could also use vocabulary lookup before crossing:
keywords = categorical_column_with_vocabulary_file(
'keywords', '/path/to/vocabulary/file', vocabulary_size=1K)
keywords_x_doc_terms = crossed_column([keywords, 'doc_terms'], 50K)
columns = [keywords_x_doc_terms, ...]
features = tf.io.parse_example(..., features=make_parse_example_spec(columns))
linear_prediction = linear_model(features, columns)
If an input feature is of numeric type, you can use
categorical_column_with_identity
, or bucketized_column
, as in the example:
# vertical_id is an integer categorical feature.
vertical_id = categorical_column_with_identity('vertical_id', 10K)
price = numeric_column('price')
# bucketized_column converts numerical feature to a categorical one.
bucketized_price = bucketized_column(price, boundaries=[...])
vertical_id_x_price = crossed_column([vertical_id, bucketized_price], 50K)
columns = [vertical_id_x_price, ...]
features = tf.io.parse_example(..., features=make_parse_example_spec(columns))
linear_prediction = linear_model(features, columns)
To use crossed column in DNN model, you need to add it in an embedding column
as in this example:
vertical_id_x_price = crossed_column([vertical_id, bucketized_price], 50K)
vertical_id_x_price_embedded = embedding_column(vertical_id_x_price, 10)
dense_tensor = input_layer(features, [vertical_id_x_price_embedded, ...])
Args |
keys
|
An iterable identifying the features to be crossed. Each element can
be either:
- string: Will use the corresponding feature which must be of string type.
CategoricalColumn : Will use the transformed tensor produced by this
column. Does not support hashed categorical column.
|
hash_bucket_size
|
An int > 1. The number of buckets.
|
hash_key
|
Specify the hash_key that will be used by the FingerprintCat64
function to combine the crosses fingerprints on SparseCrossOp (optional).
|
Raises |
ValueError
|
If len(keys) < 2 .
|
ValueError
|
If any of the keys is neither a string nor CategoricalColumn .
|
ValueError
|
If any of the keys is HashedCategoricalColumn .
|
ValueError
|
If hash_bucket_size < 1 .
|
Except as otherwise noted, the content of this page is licensed under the Creative Commons Attribution 4.0 License, and code samples are licensed under the Apache 2.0 License. For details, see the Google Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates. Some content is licensed under the numpy license.
Last updated 2023-10-06 UTC.
[[["Easy to understand","easyToUnderstand","thumb-up"],["Solved my problem","solvedMyProblem","thumb-up"],["Other","otherUp","thumb-up"]],[["Missing the information I need","missingTheInformationINeed","thumb-down"],["Too complicated / too many steps","tooComplicatedTooManySteps","thumb-down"],["Out of date","outOfDate","thumb-down"],["Samples / code issue","samplesCodeIssue","thumb-down"],["Other","otherDown","thumb-down"]],["Last updated 2023-10-06 UTC."],[],[],null,["# tf.feature_column.crossed_column\n\n\u003cbr /\u003e\n\n|--------------------------------------------------------------------------------------------------------------------------------------------------|\n| [View source on GitHub](https://github.com/tensorflow/tensorflow/blob/v2.13.1/tensorflow/python/feature_column/feature_column_v2.py#L1845-L1976) |\n\nReturns a column for performing crosses of categorical features. (deprecated)\n| **Warning:** tf.feature_column is not recommended for new code. Instead, feature preprocessing can be done directly using either [Keras preprocessing\n| layers](https://www.tensorflow.org/guide/migrate/migrating_feature_columns) or through the one-stop utility [`tf.keras.utils.FeatureSpace`](https://www.tensorflow.org/api_docs/python/tf/keras/utils/FeatureSpace) built on top of them. See the [migration guide](https://tensorflow.org/guide/migrate) for details.\n\n#### View aliases\n\n\n**Compat aliases for migration**\n\nSee\n[Migration guide](https://www.tensorflow.org/guide/migrate) for\nmore details.\n\n[`tf.compat.v1.feature_column.crossed_column`](https://www.tensorflow.org/api_docs/python/tf/feature_column/crossed_column)\n\n\u003cbr /\u003e\n\n tf.feature_column.crossed_column(\n keys, hash_bucket_size, hash_key=None\n )\n\n| **Deprecated:** THIS FUNCTION IS DEPRECATED. It will be removed in a future version. Instructions for updating: Use [`tf.keras.layers.experimental.preprocessing.HashedCrossing`](../../tf/keras/layers/HashedCrossing) instead for feature crossing when preprocessing data to train a Keras model.\n\nCrossed features will be hashed according to `hash_bucket_size`. Conceptually,\nthe transformation can be thought of as:\nHash(cartesian product of features) % `hash_bucket_size`\n\nFor example, if the input features are:\n\n- SparseTensor referred by first key:\n\n shape = [2, 2]\n {\n [0, 0]: \"a\"\n [1, 0]: \"b\"\n [1, 1]: \"c\"\n }\n\n- SparseTensor referred by second key:\n\n shape = [2, 1]\n {\n [0, 0]: \"d\"\n [1, 0]: \"e\"\n }\n\nthen crossed feature will look like: \n\n shape = [2, 2]\n {\n [0, 0]: Hash64(\"d\", Hash64(\"a\")) % hash_bucket_size\n [1, 0]: Hash64(\"e\", Hash64(\"b\")) % hash_bucket_size\n [1, 1]: Hash64(\"e\", Hash64(\"c\")) % hash_bucket_size\n }\n\nHere is an example to create a linear model with crosses of string features: \n\n keywords_x_doc_terms = crossed_column(['keywords', 'doc_terms'], 50K)\n columns = [keywords_x_doc_terms, ...]\n features = tf.io.parse_example(..., features=make_parse_example_spec(columns))\n linear_prediction = linear_model(features, columns)\n\nYou could also use vocabulary lookup before crossing: \n\n keywords = categorical_column_with_vocabulary_file(\n 'keywords', '/path/to/vocabulary/file', vocabulary_size=1K)\n keywords_x_doc_terms = crossed_column([keywords, 'doc_terms'], 50K)\n columns = [keywords_x_doc_terms, ...]\n features = tf.io.parse_example(..., features=make_parse_example_spec(columns))\n linear_prediction = linear_model(features, columns)\n\nIf an input feature is of numeric type, you can use\n`categorical_column_with_identity`, or `bucketized_column`, as in the example: \n\n # vertical_id is an integer categorical feature.\n vertical_id = categorical_column_with_identity('vertical_id', 10K)\n price = numeric_column('price')\n # bucketized_column converts numerical feature to a categorical one.\n bucketized_price = bucketized_column(price, boundaries=[...])\n vertical_id_x_price = crossed_column([vertical_id, bucketized_price], 50K)\n columns = [vertical_id_x_price, ...]\n features = tf.io.parse_example(..., features=make_parse_example_spec(columns))\n linear_prediction = linear_model(features, columns)\n\nTo use crossed column in DNN model, you need to add it in an embedding column\nas in this example: \n\n vertical_id_x_price = crossed_column([vertical_id, bucketized_price], 50K)\n vertical_id_x_price_embedded = embedding_column(vertical_id_x_price, 10)\n dense_tensor = input_layer(features, [vertical_id_x_price_embedded, ...])\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n| Args ---- ||\n|--------------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|\n| `keys` | An iterable identifying the features to be crossed. Each element can be either: \u003cbr /\u003e - string: Will use the corresponding feature which must be of string type. - `CategoricalColumn`: Will use the transformed tensor produced by this column. Does not support hashed categorical column. |\n| `hash_bucket_size` | An int \\\u003e 1. The number of buckets. |\n| `hash_key` | Specify the hash_key that will be used by the `FingerprintCat64` function to combine the crosses fingerprints on SparseCrossOp (optional). |\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n| Returns ------- ||\n|---|---|\n| A `CrossedColumn`. ||\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n| Raises ------ ||\n|--------------|-----------------------------------------------------------------|\n| `ValueError` | If `len(keys) \u003c 2`. |\n| `ValueError` | If any of the keys is neither a string nor `CategoricalColumn`. |\n| `ValueError` | If any of the keys is `HashedCategoricalColumn`. |\n| `ValueError` | If `hash_bucket_size \u003c 1`. |\n\n\u003cbr /\u003e"]]