tff.analytics.heavy_hitters.iblt.build_iblt_computation

Builds the tff.Computation for heavy-hitters discovery with IBLT.

Used in the notebooks

Used in the tutorials

string_max_bytes The maximum length in bytes of a string in the IBLT. Strings longer than string_max_bytes will be truncated. Defaults to 10. Must be positive.
max_words_per_user The maximum total count each client is allowed to contribute across all words. If not None, must be a positive integer. Defaults to None, which means all the clients contribute all their words. Note that this does not cap the count of each individual word each client can contribute. Set multi_contirbution=False to restrict the per-client count for each word.
multi_contribution Whether each client is allowed to contribute multiple instances of each string, or only a count of one for each unique word. Defaults to True meaning clients contribute the full count for each contributed string. Note that this doesn't limit the total number of strings each client can contribute. Set max_words_per_user to limit the total number of strings per client.
capacity The capacity of the IBLT sketch. The capacity should be set to be the maximum number of unique strings expected across all clients in a single iteration. The more the actual number of unique strings exceeds the configured capacity, the more likely it is that the IBLT will fail to decode results. IBLT capacity impacts the size of the IBLT data structure. Large values will increase client memory usage and network I/O. If you don't have an estimate for number of unique strings but can tolerate some resource overhead, start by setting capacity high and then tuning down based on heavy hitter results. Defaults to 1000.
k_anonymity Only return words contributed by at least k clients in a single iteration. Must be a positive integer. Defaults to 1.
max_heavy_hitters The maximum number of items to return. If the decoded results have more than this number of items, the strings will be sorted decreasingly by the estimated counts and return the top max_heavy_hitters items. Default is None, which means to return all the heavy hitters in the result.
string_postprocessor A callable function that is run after strings are decoded from the IBLT in order to postprocess them. It should accept a single string tensor and output a single string tensor of the same shape. If None, no postprocessing is done.
secure_sum_bitwidth The bitwidth used for federated secure sum. The default value is None, which disables secure sum. If not None, must be in the range [1,62]. Note that when this parameter is not None, the IBLT sketches are summed via tff.backends.mapreduce.federated_secure_modular_sum with modulus equal to IBLT's default field size, and other values (client count, string count tensor) are aggregated via federated_secure_sum with max_input=2**secure_sum_bitwidth - 1.
decode_iblt_fn A function to decode key-value pairs from an IBLT sketch. Defaults to None, in this case decode_iblt_fn will be set to iblt.decode_iblt_tf.
seed An integer seed for hash functions. Defaults to 0.
batch_size The number of elements in each batch of the dataset. Batching is an optimization for pulling multiple inputs at a time from the input tf.data.Dataset, amortizing the overhead cost of each read to the batch_size. Consider batching if you observe poor client execution performance or reading inputs is particularly expsensive. Defaults to 1, means the input dataset is processed by tf.data.Dataset.batch(1). Must be positive.
repetitions The number of repetitions in IBLT data structure. This sets the number of hash functions to use in the IBLT. Additional repetitions will significantly decrease the likelihood of decoding failures, at the expense of multiplying the size of the data structure. Most callers should not override the default. Defaults to 3. Must be at least 3.

A tff.Computation that performs federated heavy hitter discovery.

ValueError if parameters don't meet expectations.