tf.lookup.StaticVocabularyTable
Stay organized with collections
Save and categorize content based on your preferences.
String to Id table that assigns out-of-vocabulary keys to hash buckets.
Inherits From: TrackableResource
tf.lookup.StaticVocabularyTable(
initializer,
num_oov_buckets,
lookup_key_dtype=None,
name=None,
experimental_is_anonymous=False
)
For example, if an instance of StaticVocabularyTable
is initialized with a
string-to-id initializer that maps:
init = tf.lookup.KeyValueTensorInitializer(
keys=tf.constant(['emerson', 'lake', 'palmer']),
values=tf.constant([0, 1, 2], dtype=tf.int64))
table = tf.lookup.StaticVocabularyTable(
init,
num_oov_buckets=5)
The Vocabulary
object will performs the following mapping:
emerson -> 0
lake -> 1
palmer -> 2
<other term> -> bucket_id
, where bucket_id
will be between 3
and
3 + num_oov_buckets - 1 = 7
, calculated by:
hash(<term>) % num_oov_buckets + vocab_size
input_tensor = tf.constant(["emerson", "lake", "palmer",
"king", "crimson"])
table[input_tensor].numpy()
array([0, 1, 2, 6, 7])
If initializer
is None, only out-of-vocabulary buckets are used.
Example usage:
num_oov_buckets = 3
vocab = ["emerson", "lake", "palmer", "crimnson"]
import tempfile
f = tempfile.NamedTemporaryFile(delete=False)
f.write('\n'.join(vocab).encode('utf-8'))
f.close()
init = tf.lookup.TextFileInitializer(
f.name,
key_dtype=tf.string, key_index=tf.lookup.TextFileIndex.WHOLE_LINE,
value_dtype=tf.int64, value_index=tf.lookup.TextFileIndex.LINE_NUMBER)
table = tf.lookup.StaticVocabularyTable(init, num_oov_buckets)
table.lookup(tf.constant(["palmer", "crimnson" , "king",
"tarkus", "black", "moon"])).numpy()
array([2, 3, 5, 6, 6, 4])
The hash function used for generating out-of-vocabulary buckets ID is
Fingerprint64.
Note that the out-of-vocabulary bucket IDs always range from the table size
up to size + num_oov_buckets - 1
regardless of the table values, which could
cause unexpected collisions:
init = tf.lookup.KeyValueTensorInitializer(
keys=tf.constant(["emerson", "lake", "palmer"]),
values=tf.constant([1, 2, 3], dtype=tf.int64))
table = tf.lookup.StaticVocabularyTable(
init,
num_oov_buckets=1)
input_tensor = tf.constant(["emerson", "lake", "palmer", "king"])
table[input_tensor].numpy()
array([1, 2, 3, 3])
Args |
initializer
|
A TableInitializerBase object that contains the data used
to initialize the table. If None, then we only use out-of-vocab buckets.
|
num_oov_buckets
|
Number of buckets to use for out-of-vocabulary keys. Must
be greater than zero. If out-of-vocab buckets are not required, use
StaticHashTable instead.
|
lookup_key_dtype
|
Data type of keys passed to lookup . Defaults to
initializer.key_dtype if initializer is specified, otherwise
tf.string . Must be string or integer, and must be castable to
initializer.key_dtype .
|
name
|
A name for the operation (optional).
|
experimental_is_anonymous
|
Whether to use anonymous mode for the
table (default is False). In anonymous mode, the table
resource can only be accessed via a resource handle. It can't
be looked up by a name. When all resource handles pointing to
that resource are gone, the resource will be deleted
automatically.
|
Raises |
ValueError
|
when num_oov_buckets is not positive.
|
TypeError
|
when lookup_key_dtype or initializer.key_dtype are not
integer or string. Also when initializer.value_dtype != int64.
|
Attributes |
key_dtype
|
The table key dtype.
|
name
|
The name of the table.
|
resource_handle
|
Returns the resource handle associated with this Resource.
|
value_dtype
|
The table value dtype.
|
Methods
lookup
View source
lookup(
keys, name=None
)
Looks up keys
in the table, outputs the corresponding values.
It assigns out-of-vocabulary keys to buckets based in their hashes.
Args |
keys
|
Keys to look up. May be either a SparseTensor or dense Tensor .
|
name
|
Optional name for the op.
|
Returns |
A SparseTensor if keys are sparse, a RaggedTensor if keys are ragged,
otherwise a dense Tensor .
|
Raises |
TypeError
|
when keys doesn't match the table key data type.
|
size
View source
size(
name=None
)
Compute the number of elements in this table.
__getitem__
View source
__getitem__(
keys
)
Looks up keys
in a table, outputs the corresponding values.
Except as otherwise noted, the content of this page is licensed under the Creative Commons Attribution 4.0 License, and code samples are licensed under the Apache 2.0 License. For details, see the Google Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates. Some content is licensed under the numpy license.
Last updated 2023-10-06 UTC.
[[["Easy to understand","easyToUnderstand","thumb-up"],["Solved my problem","solvedMyProblem","thumb-up"],["Other","otherUp","thumb-up"]],[["Missing the information I need","missingTheInformationINeed","thumb-down"],["Too complicated / too many steps","tooComplicatedTooManySteps","thumb-down"],["Out of date","outOfDate","thumb-down"],["Samples / code issue","samplesCodeIssue","thumb-down"],["Other","otherDown","thumb-down"]],["Last updated 2023-10-06 UTC."],[],[],null,["# tf.lookup.StaticVocabularyTable\n\n\u003cbr /\u003e\n\n|--------------------------------------------------------------------------------------------------------------------------------|\n| [View source on GitHub](https://github.com/tensorflow/tensorflow/blob/v2.13.1/tensorflow/python/ops/lookup_ops.py#L1181-L1396) |\n\nString to Id table that assigns out-of-vocabulary keys to hash buckets.\n\nInherits From: [`TrackableResource`](../../tf/saved_model/experimental/TrackableResource) \n\n tf.lookup.StaticVocabularyTable(\n initializer,\n num_oov_buckets,\n lookup_key_dtype=None,\n name=None,\n experimental_is_anonymous=False\n )\n\nFor example, if an instance of `StaticVocabularyTable` is initialized with a\nstring-to-id initializer that maps: \n\n init = tf.lookup.KeyValueTensorInitializer(\n keys=tf.constant(['emerson', 'lake', 'palmer']),\n values=tf.constant([0, 1, 2], dtype=tf.int64))\n table = tf.lookup.StaticVocabularyTable(\n init,\n num_oov_buckets=5)\n\nThe `Vocabulary` object will performs the following mapping:\n\n- `emerson -\u003e 0`\n- `lake -\u003e 1`\n- `palmer -\u003e 2`\n- `\u003cother term\u003e -\u003e bucket_id`, where `bucket_id` will be between `3` and `3 + num_oov_buckets - 1 = 7`, calculated by: `hash(\u003cterm\u003e) % num_oov_buckets + vocab_size`\n\n#### If input_tensor is:\n\n input_tensor = tf.constant([\"emerson\", \"lake\", \"palmer\",\n \"king\", \"crimson\"])\n table[input_tensor].numpy()\n array([0, 1, 2, 6, 7])\n\nIf `initializer` is None, only out-of-vocabulary buckets are used.\n\n#### Example usage:\n\n num_oov_buckets = 3\n vocab = [\"emerson\", \"lake\", \"palmer\", \"crimnson\"]\n import tempfile\n f = tempfile.NamedTemporaryFile(delete=False)\n f.write('\\n'.join(vocab).encode('utf-8'))\n f.close()\n\n init = tf.lookup.TextFileInitializer(\n f.name,\n key_dtype=tf.string, key_index=tf.lookup.TextFileIndex.WHOLE_LINE,\n value_dtype=tf.int64, value_index=tf.lookup.TextFileIndex.LINE_NUMBER)\n table = tf.lookup.StaticVocabularyTable(init, num_oov_buckets)\n table.lookup(tf.constant([\"palmer\", \"crimnson\" , \"king\",\n \"tarkus\", \"black\", \"moon\"])).numpy()\n array([2, 3, 5, 6, 6, 4])\n\nThe hash function used for generating out-of-vocabulary buckets ID is\nFingerprint64.\n\nNote that the out-of-vocabulary bucket IDs always range from the table `size`\nup to `size + num_oov_buckets - 1` regardless of the table values, which could\ncause unexpected collisions: \n\n init = tf.lookup.KeyValueTensorInitializer(\n keys=tf.constant([\"emerson\", \"lake\", \"palmer\"]),\n values=tf.constant([1, 2, 3], dtype=tf.int64))\n table = tf.lookup.StaticVocabularyTable(\n init,\n num_oov_buckets=1)\n input_tensor = tf.constant([\"emerson\", \"lake\", \"palmer\", \"king\"])\n table[input_tensor].numpy()\n array([1, 2, 3, 3])\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n| Args ---- ||\n|-----------------------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|\n| `initializer` | A `TableInitializerBase` object that contains the data used to initialize the table. If None, then we only use out-of-vocab buckets. |\n| `num_oov_buckets` | Number of buckets to use for out-of-vocabulary keys. Must be greater than zero. If out-of-vocab buckets are not required, use `StaticHashTable` instead. |\n| `lookup_key_dtype` | Data type of keys passed to `lookup`. Defaults to `initializer.key_dtype` if `initializer` is specified, otherwise [`tf.string`](../../tf#string). Must be string or integer, and must be castable to `initializer.key_dtype`. |\n| `name` | A name for the operation (optional). |\n| `experimental_is_anonymous` | Whether to use anonymous mode for the table (default is False). In anonymous mode, the table resource can only be accessed via a resource handle. It can't be looked up by a name. When all resource handles pointing to that resource are gone, the resource will be deleted automatically. |\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n| Raises ------ ||\n|--------------|-----------------------------------------------------------------------------------------------------------------------|\n| `ValueError` | when `num_oov_buckets` is not positive. |\n| `TypeError` | when lookup_key_dtype or initializer.key_dtype are not integer or string. Also when initializer.value_dtype != int64. |\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n| Attributes ---------- ||\n|-------------------|------------------------------------------------------------|\n| `key_dtype` | The table key dtype. |\n| `name` | The name of the table. |\n| `resource_handle` | Returns the resource handle associated with this Resource. |\n| `value_dtype` | The table value dtype. |\n\n\u003cbr /\u003e\n\nMethods\n-------\n\n### `lookup`\n\n[View source](https://github.com/tensorflow/tensorflow/blob/v2.13.1/tensorflow/python/ops/lookup_ops.py#L1353-L1396) \n\n lookup(\n keys, name=None\n )\n\nLooks up `keys` in the table, outputs the corresponding values.\n\nIt assigns out-of-vocabulary keys to buckets based in their hashes.\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n| Args ||\n|--------|--------------------------------------------------------------------|\n| `keys` | Keys to look up. May be either a `SparseTensor` or dense `Tensor`. |\n| `name` | Optional name for the op. |\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n| Returns ||\n|---|---|\n| A `SparseTensor` if keys are sparse, a `RaggedTensor` if keys are ragged, otherwise a dense `Tensor`. ||\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n| Raises ||\n|-------------|----------------------------------------------------|\n| `TypeError` | when `keys` doesn't match the table key data type. |\n\n\u003cbr /\u003e\n\n### `size`\n\n[View source](https://github.com/tensorflow/tensorflow/blob/v2.13.1/tensorflow/python/ops/lookup_ops.py#L1344-L1351) \n\n size(\n name=None\n )\n\nCompute the number of elements in this table.\n\n### `__getitem__`\n\n[View source](https://github.com/tensorflow/tensorflow/blob/v2.13.1/tensorflow/python/ops/lookup_ops.py#L168-L170) \n\n __getitem__(\n keys\n )\n\nLooks up `keys` in a table, outputs the corresponding values."]]