tff.analytics.heavy_hitters.iblt.IbltEncoder
Stay organized with collections
Save and categorize content based on your preferences.
Encodes the strings into an IBLT data structure.
tff.analytics.heavy_hitters.iblt.IbltEncoder(
capacity,
string_max_bytes,
*,
encoding: tff.analytics.heavy_hitters.iblt.CharacterEncoding
= tff.analytics.heavy_hitters.iblt.CharacterEncoding.UTF8
,
drop_strings_above_max_length=False,
seed=0,
repetitions=DEFAULT_REPETITIONS,
hash_family=None,
hash_family_params=None,
field_size=DEFAULT_FIELD_SIZE
)
The IBLT is a numpy array of shape [repetitions, table_size, num_chunks+2].
Its value at index (r, h, c)
corresponds to (r
is a repetition):
sum of chunk c
of keys hashing to h
in r
if c < num_chunks
,
sum of counts of keys hashing to h
in r
if c = num_chunks
,
sum of checks of keys hashing to h
in r
if c = num_chunks + 1
.
Args |
capacity
|
Number of distinct strings that we expect to be inserted.
|
string_max_bytes
|
Maximum length of a string in bytesthat can be inserted.
|
encoding
|
The character encoding of the string data to encode. For
non-character binary data or strings with unknown encoding, specify
CharacterEncoding.UNKNOWN . Defaults to CharacterEncoding.UTF8 .
|
drop_strings_above_max_length
|
If True, strings above string_max_bytes
will be dropped when constructing the IBLT. Defaults to False.
|
seed
|
Integer seed for hash functions. Defaults to 0.
|
repetitions
|
Number of repetitions in IBLT data structure (must be >= 3).
Defaults to 3.
|
hash_family
|
String specifying the hash family to use to construct IBLT.
(options include coupled or random, default is chosen based on capacity)
|
hash_family_params
|
A dict of parameters that the hash family hasher
expects. (defaults are chosen based on capacity.)
|
field_size
|
The field size for all values in IBLT. Defaults to 2**31 - 1.
|
Methods
compute_chunks
View source
compute_chunks(
input_strings
)
Returns Tensor containing integer chunks for input strings.
Args |
input_strings
|
A tensor of strings.
|
Returns |
A 2D tensor with rows consisting of integer chunks corresponding to the
string indexed by the row and a trimmed input_strings that can fit in
the IBLT.
|
compute_iblt
View source
@tf.function
compute_iblt(
input_strings, input_counts=None
)
Returns Tensor containing the values of the IBLT data structure.
Args |
input_strings
|
A 1D tensor of strings.
|
input_counts
|
A 1D tensor of tf.int64 representing the count of each
string.
|
Returns |
A tensor of shape [repetitions, table_size, num_chunks+2] whose value at
index (r, h, c) corresponds to chunk c of the keys if c < num_chunks, to
the counts if c = num_chunks, and to the checks if c = num_chunks + 1.
|
Except as otherwise noted, the content of this page is licensed under the Creative Commons Attribution 4.0 License, and code samples are licensed under the Apache 2.0 License. For details, see the Google Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.
Last updated 2024-09-20 UTC.
[[["Easy to understand","easyToUnderstand","thumb-up"],["Solved my problem","solvedMyProblem","thumb-up"],["Other","otherUp","thumb-up"]],[["Missing the information I need","missingTheInformationINeed","thumb-down"],["Too complicated / too many steps","tooComplicatedTooManySteps","thumb-down"],["Out of date","outOfDate","thumb-down"],["Samples / code issue","samplesCodeIssue","thumb-down"],["Other","otherDown","thumb-down"]],["Last updated 2024-09-20 UTC."],[],[]]