Crossed features will be hashed according to hash_bucket_size. Conceptually,
the transformation can be thought of as:
Hash(cartesian product of features) % hash_bucket_size
If an input feature is of numeric type, you can use
categorical_column_with_identity, or bucketized_column, as in the example:
# vertical_id is an integer categorical feature.vertical_id=categorical_column_with_identity('vertical_id',10K)price=numeric_column('price')# bucketized_column converts numerical feature to a categorical one.bucketized_price=bucketized_column(price,boundaries=[...])vertical_id_x_price=crossed_column([vertical_id,bucketized_price],50K)columns=[vertical_id_x_price,...]features=tf.io.parse_example(...,features=make_parse_example_spec(columns))linear_prediction=linear_model(features,columns)
To use crossed column in DNN model, you need to add it in an embedding column
as in this example:
[[["Easy to understand","easyToUnderstand","thumb-up"],["Solved my problem","solvedMyProblem","thumb-up"],["Other","otherUp","thumb-up"]],[["Missing the information I need","missingTheInformationINeed","thumb-down"],["Too complicated / too many steps","tooComplicatedTooManySteps","thumb-down"],["Out of date","outOfDate","thumb-down"],["Samples / code issue","samplesCodeIssue","thumb-down"],["Other","otherDown","thumb-down"]],["Last updated 2024-04-26 UTC."],[],[],null,["# tf.feature_column.crossed_column\n\n\u003cbr /\u003e\n\n|--------------------------------------------------------------------------------------------------------------------------------------------------|\n| [View source on GitHub](https://github.com/tensorflow/tensorflow/blob/v2.16.1/tensorflow/python/feature_column/feature_column_v2.py#L1642-L1773) |\n\nReturns a column for performing crosses of categorical features. (deprecated)\n| **Warning:** tf.feature_column is not recommended for new code. Instead, feature preprocessing can be done directly using either [Keras preprocessing\n| layers](https://www.tensorflow.org/guide/migrate/migrating_feature_columns) or through the one-stop utility [`tf.keras.utils.FeatureSpace`](https://www.tensorflow.org/api_docs/python/tf/keras/utils/FeatureSpace) built on top of them. See the [migration guide](https://tensorflow.org/guide/migrate) for details.\n\n#### View aliases\n\n\n**Compat aliases for migration**\n\nSee\n[Migration guide](https://www.tensorflow.org/guide/migrate) for\nmore details.\n\n[`tf.compat.v1.feature_column.crossed_column`](https://www.tensorflow.org/api_docs/python/tf/feature_column/crossed_column)\n\n\u003cbr /\u003e\n\n tf.feature_column.crossed_column(\n keys, hash_bucket_size, hash_key=None\n )\n\n### Used in the notebooks\n\n| Used in the tutorials |\n|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|\n| - [Classify structured data with feature columns](https://www.tensorflow.org/tutorials/structured_data/feature_columns) - [Build a linear model with Estimators](https://www.tensorflow.org/tutorials/estimator/linear) |\n\n| **Deprecated:** THIS FUNCTION IS DEPRECATED. It will be removed in a future version. Instructions for updating: Use `tf.keras.layers.experimental.preprocessing.HashedCrossing` instead for feature crossing when preprocessing data to train a Keras model.\n\nCrossed features will be hashed according to `hash_bucket_size`. Conceptually,\nthe transformation can be thought of as:\nHash(cartesian product of features) % `hash_bucket_size`\n\nFor example, if the input features are:\n\n- SparseTensor referred by first key:\n\n shape = [2, 2]\n {\n [0, 0]: \"a\"\n [1, 0]: \"b\"\n [1, 1]: \"c\"\n }\n\n- SparseTensor referred by second key:\n\n shape = [2, 1]\n {\n [0, 0]: \"d\"\n [1, 0]: \"e\"\n }\n\nthen crossed feature will look like: \n\n shape = [2, 2]\n {\n [0, 0]: Hash64(\"d\", Hash64(\"a\")) % hash_bucket_size\n [1, 0]: Hash64(\"e\", Hash64(\"b\")) % hash_bucket_size\n [1, 1]: Hash64(\"e\", Hash64(\"c\")) % hash_bucket_size\n }\n\nHere is an example to create a linear model with crosses of string features: \n\n keywords_x_doc_terms = crossed_column(['keywords', 'doc_terms'], 50K)\n columns = [keywords_x_doc_terms, ...]\n features = tf.io.parse_example(..., features=make_parse_example_spec(columns))\n linear_prediction = linear_model(features, columns)\n\nYou could also use vocabulary lookup before crossing: \n\n keywords = categorical_column_with_vocabulary_file(\n 'keywords', '/path/to/vocabulary/file', vocabulary_size=1K)\n keywords_x_doc_terms = crossed_column([keywords, 'doc_terms'], 50K)\n columns = [keywords_x_doc_terms, ...]\n features = tf.io.parse_example(..., features=make_parse_example_spec(columns))\n linear_prediction = linear_model(features, columns)\n\nIf an input feature is of numeric type, you can use\n`categorical_column_with_identity`, or `bucketized_column`, as in the example: \n\n # vertical_id is an integer categorical feature.\n vertical_id = categorical_column_with_identity('vertical_id', 10K)\n price = numeric_column('price')\n # bucketized_column converts numerical feature to a categorical one.\n bucketized_price = bucketized_column(price, boundaries=[...])\n vertical_id_x_price = crossed_column([vertical_id, bucketized_price], 50K)\n columns = [vertical_id_x_price, ...]\n features = tf.io.parse_example(..., features=make_parse_example_spec(columns))\n linear_prediction = linear_model(features, columns)\n\nTo use crossed column in DNN model, you need to add it in an embedding column\nas in this example: \n\n vertical_id_x_price = crossed_column([vertical_id, bucketized_price], 50K)\n vertical_id_x_price_embedded = embedding_column(vertical_id_x_price, 10)\n dense_tensor = input_layer(features, [vertical_id_x_price_embedded, ...])\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n| Args ---- ||\n|--------------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|\n| `keys` | An iterable identifying the features to be crossed. Each element can be either: \u003cbr /\u003e - string: Will use the corresponding feature which must be of string type. - `CategoricalColumn`: Will use the transformed tensor produced by this column. Does not support hashed categorical column. |\n| `hash_bucket_size` | An int \\\u003e 1. The number of buckets. |\n| `hash_key` | Specify the hash_key that will be used by the `FingerprintCat64` function to combine the crosses fingerprints on SparseCrossOp (optional). |\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n| Returns ------- ||\n|---|---|\n| A `CrossedColumn`. ||\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n| Raises ------ ||\n|--------------|-----------------------------------------------------------------|\n| `ValueError` | If `len(keys) \u003c 2`. |\n| `ValueError` | If any of the keys is neither a string nor `CategoricalColumn`. |\n| `ValueError` | If any of the keys is `HashedCategoricalColumn`. |\n| `ValueError` | If `hash_bucket_size \u003c 1`. |\n\n\u003cbr /\u003e"]]