tf.keras.layers.experimental.preprocessing.CategoryEncoding
Stay organized with collections
Save and categorize content based on your preferences.
Category encoding layer.
tf.keras.layers.experimental.preprocessing.CategoryEncoding(
max_tokens=None, output_mode=BINARY, sparse=False, **kwargs
)
This layer provides options for condensing data into a categorical encoding.
It accepts integer values as inputs and outputs a dense representation
(one sample = 1-index tensor of float values representing data about the
sample's tokens) of those inputs.
Examples:
layer = tf.keras.layers.experimental.preprocessing.CategoryEncoding(
max_tokens=4, output_mode="count")
layer([[0, 1], [0, 0], [1, 2], [3, 1]])
<tf.Tensor: shape=(4, 4), dtype=float32, numpy=
array([[1., 1., 0., 0.],
[2., 0., 0., 0.],
[0., 1., 1., 0.],
[0., 1., 0., 1.]], dtype=float32)>
Examples with weighted inputs:
layer = tf.keras.layers.experimental.preprocessing.CategoryEncoding(
max_tokens=4, output_mode="count")
count_weights = np.array([[.1, .2], [.1, .1], [.2, .3], [.4, .2]])
layer([[0, 1], [0, 0], [1, 2], [3, 1]], count_weights=count_weights)
<tf.Tensor: shape=(4, 4), dtype=float64, numpy=
array([[0.1, 0.2, 0. , 0. ],
[0.2, 0. , 0. , 0. ],
[0. , 0.2, 0.3, 0. ],
[0. , 0.2, 0. , 0.4]])>
Call arguments:
inputs
: A 2D tensor (samples, timesteps)
.
count_weights
: A 2D tensor in the same shape as inputs
indicating the
weight for each sample value when summing up in count
mode. Not used in
binary
or tfidf
mode.
Attributes |
max_tokens
|
The maximum size of the vocabulary for this layer. If None,
there is no cap on the size of the vocabulary.
|
output_mode
|
Specification for the output of the layer.
Defaults to "binary". Values can
be "binary", "count" or "tf-idf", configuring the layer as follows:
"binary": Outputs a single int array per batch, of either vocab_size or
max_tokens size, containing 1s in all elements where the token mapped
to that index exists at least once in the batch item.
"count": As "binary", but the int array contains a count of the number
of times the token at that index appeared in the batch item.
"tf-idf": As "binary", but the TF-IDF algorithm is applied to find the
value in each token slot.
|
sparse
|
Boolean. If true, returns a SparseTensor instead of a dense
Tensor . Defaults to False .
|
Methods
adapt
View source
adapt(
data, reset_state=True
)
Fits the state of the preprocessing layer to the dataset.
Overrides the default adapt method to apply relevant preprocessing to the
inputs before passing to the combiner.
Arguments |
data
|
The data to train on. It can be passed either as a tf.data Dataset,
or as a numpy array.
|
reset_state
|
Optional argument specifying whether to clear the state of
the layer at the start of the call to adapt . This must be True for
this layer, which does not support repeated calls to adapt .
|
Raises |
RuntimeError
|
if the layer cannot be adapted at this time.
|
set_num_elements
View source
set_num_elements(
num_elements
)
set_tfidf_data
View source
set_tfidf_data(
tfidf_data
)
Except as otherwise noted, the content of this page is licensed under the Creative Commons Attribution 4.0 License, and code samples are licensed under the Apache 2.0 License. For details, see the Google Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.
Last updated 2020-10-01 UTC.
[[["Easy to understand","easyToUnderstand","thumb-up"],["Solved my problem","solvedMyProblem","thumb-up"],["Other","otherUp","thumb-up"]],[["Missing the information I need","missingTheInformationINeed","thumb-down"],["Too complicated / too many steps","tooComplicatedTooManySteps","thumb-down"],["Out of date","outOfDate","thumb-down"],["Samples / code issue","samplesCodeIssue","thumb-down"],["Other","otherDown","thumb-down"]],["Last updated 2020-10-01 UTC."],[],[],null,["# tf.keras.layers.experimental.preprocessing.CategoryEncoding\n\n\u003cbr /\u003e\n\n|----------------------------------------------------------------------------------------------------------------------------------------------------------|\n| [View source on GitHub](https://github.com/tensorflow/tensorflow/blob/v2.3.0/tensorflow/python/keras/layers/preprocessing/category_encoding.py#L56-L328) |\n\nCategory encoding layer. \n\n tf.keras.layers.experimental.preprocessing.CategoryEncoding(\n max_tokens=None, output_mode=BINARY, sparse=False, **kwargs\n )\n\nThis layer provides options for condensing data into a categorical encoding.\nIt accepts integer values as inputs and outputs a dense representation\n(one sample = 1-index tensor of float values representing data about the\nsample's tokens) of those inputs.\n\n#### Examples:\n\n layer = tf.keras.layers.experimental.preprocessing.CategoryEncoding(\n max_tokens=4, output_mode=\"count\")\n layer([[0, 1], [0, 0], [1, 2], [3, 1]])\n \u003ctf.Tensor: shape=(4, 4), dtype=float32, numpy=\n array([[1., 1., 0., 0.],\n [2., 0., 0., 0.],\n [0., 1., 1., 0.],\n [0., 1., 0., 1.]], dtype=float32)\u003e\n\nExamples with weighted inputs: \n\n layer = tf.keras.layers.experimental.preprocessing.CategoryEncoding(\n max_tokens=4, output_mode=\"count\")\n count_weights = np.array([[.1, .2], [.1, .1], [.2, .3], [.4, .2]])\n layer([[0, 1], [0, 0], [1, 2], [3, 1]], count_weights=count_weights)\n \u003ctf.Tensor: shape=(4, 4), dtype=float64, numpy=\n array([[0.1, 0.2, 0. , 0. ],\n [0.2, 0. , 0. , 0. ],\n [0. , 0.2, 0.3, 0. ],\n [0. , 0.2, 0. , 0.4]])\u003e\n\n#### Call arguments:\n\n- **`inputs`** : A 2D tensor `(samples, timesteps)`.\n- **`count_weights`** : A 2D tensor in the same shape as `inputs` indicating the weight for each sample value when summing up in `count` mode. Not used in `binary` or `tfidf` mode.\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n| Attributes ---------- ||\n|---------------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|\n| `max_tokens` | The maximum size of the vocabulary for this layer. If None, there is no cap on the size of the vocabulary. |\n| `output_mode` | Specification for the output of the layer. Defaults to \"binary\". Values can be \"binary\", \"count\" or \"tf-idf\", configuring the layer as follows: \"binary\": Outputs a single int array per batch, of either vocab_size or max_tokens size, containing 1s in all elements where the token mapped to that index exists at least once in the batch item. \"count\": As \"binary\", but the int array contains a count of the number of times the token at that index appeared in the batch item. \"tf-idf\": As \"binary\", but the TF-IDF algorithm is applied to find the value in each token slot. |\n| `sparse` | Boolean. If true, returns a `SparseTensor` instead of a dense `Tensor`. Defaults to `False`. |\n\n\u003cbr /\u003e\n\nMethods\n-------\n\n### `adapt`\n\n[View source](https://github.com/tensorflow/tensorflow/blob/v2.3.0/tensorflow/python/keras/layers/preprocessing/category_encoding.py#L181-L203) \n\n adapt(\n data, reset_state=True\n )\n\nFits the state of the preprocessing layer to the dataset.\n\nOverrides the default adapt method to apply relevant preprocessing to the\ninputs before passing to the combiner.\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n| Arguments ||\n|---------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|\n| `data` | The data to train on. It can be passed either as a tf.data Dataset, or as a numpy array. |\n| `reset_state` | Optional argument specifying whether to clear the state of the layer at the start of the call to `adapt`. This must be True for this layer, which does not support repeated calls to `adapt`. |\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n| Raises ||\n|----------------|----------------------------------------------|\n| `RuntimeError` | if the layer cannot be adapted at this time. |\n\n\u003cbr /\u003e\n\n### `set_num_elements`\n\n[View source](https://github.com/tensorflow/tensorflow/blob/v2.3.0/tensorflow/python/keras/layers/preprocessing/category_encoding.py#L240-L250) \n\n set_num_elements(\n num_elements\n )\n\n### `set_tfidf_data`\n\n[View source](https://github.com/tensorflow/tensorflow/blob/v2.3.0/tensorflow/python/keras/layers/preprocessing/category_encoding.py#L252-L267) \n\n set_tfidf_data(\n tfidf_data\n )"]]