View source on GitHub |
Entropy model for power-law distributed random variables.
tfc.entropy_models.PowerLawEntropyModel(
coding_rank, alpha=0.01, bottleneck_dtype=None
)
This entropy model handles quantization and compression of a bottleneck tensor and implements a penalty that encourages compressibility under the Elias gamma code.
The gamma code has code lengths 1 + 2 floor(log_2(x))
, for x
a positive
integer, and is close to optimal if x
is distributed according to a power
law. Being a universal code, it also guarantees that in the worst case, the
expected code length is no more than 3 times the entropy of the empirical
distribution of x
, as long as probability decreases with increasing x
. For
details on the gamma code, see:
"Universal Codeword Sets and Representations of the Integers"
P. Elias
https://doi.org/10.1109/TIT.1975.1055349
Given a signed integer, run_length_gamma_encode
encodes zeros using a
run-length code, the sign using a uniform bit, and applies the gamma code to
the magnitude.
The penalty applied by this class is given by:
log((abs(x) + alpha) / alpha)
This encourages x
to follow a symmetrized power law, but only approximately
for alpha > 0
. Without alpha
, the penalty would have a singularity at
zero. Setting alpha
to a small positive value ensures that the penalty is
non-negative, and that its gradients are useful for optimization.
Args | |
---|---|
coding_rank
|
Integer. Number of innermost dimensions considered a coding
unit. Each coding unit is compressed to its own bit string, and the
estimated rate is summed over each coding unit in bits() .
|
alpha
|
Float. Regularization parameter preventing gradient singularity around zero. |
bottleneck_dtype
|
tf.dtypes.DType . Data type of bottleneck tensor.
Defaults to tf.keras.mixed_precision.global_policy().compute_dtype .
|
Attributes | |
---|---|
alpha
|
Alpha parameter. |
bottleneck_dtype
|
Data type of the bottleneck tensor. |
coding_rank
|
Number of innermost dimensions considered a coding unit. |
name
|
Returns the name of this module as passed or determined in the ctor. |
name_scope
|
Returns a tf.name_scope instance for this class.
|
non_trainable_variables
|
Sequence of non-trainable variables owned by this module and its submodules. |
submodules
|
Sequence of all sub-modules.
Submodules are modules which are properties of this module, or found as properties of modules which are properties of this module (and so on).
|
trainable_variables
|
Sequence of trainable variables owned by this module and its submodules. |
variables
|
Sequence of variables owned by this module and its submodules. |
Methods
compress
compress(
bottleneck
)
Compresses a floating-point tensor.
Compresses the tensor to bit strings. bottleneck
is first quantized
as in quantize()
, and then compressed using the run-length gamma code. The
quantized tensor can later be recovered by calling decompress()
.
The innermost self.coding_rank
dimensions are treated as one coding unit,
i.e. are compressed into one string each. Any additional dimensions to the
left are treated as batch dimensions.
Args | |
---|---|
bottleneck
|
tf.Tensor containing the data to be compressed. Must have at
least self.coding_rank dimensions.
|
Returns | |
---|---|
A tf.Tensor having the same shape as bottleneck without the
self.coding_rank innermost dimensions, containing a string for each
coding unit.
|
decompress
decompress(
strings, code_shape
)
Decompresses a tensor.
Reconstructs the quantized tensor from bit strings produced by compress()
.
Args | |
---|---|
strings
|
tf.Tensor containing the compressed bit strings.
|
code_shape
|
Shape of innermost dimensions of the output tf.Tensor .
|
Returns | |
---|---|
A tf.Tensor of shape tf.shape(strings) + code_shape .
|
penalty
penalty(
bottleneck
)
Computes penalty encouraging compressibility.
Args | |
---|---|
bottleneck
|
tf.Tensor containing the data to be compressed. Must have at
least self.coding_rank dimensions.
|
Returns | |
---|---|
Penalty value, which has the same shape as bottleneck without the
self.coding_rank innermost dimensions.
|
quantize
quantize(
bottleneck
)
Quantizes a floating-point bottleneck tensor.
The tensor is rounded to integer values. The gradient of this rounding operation is overridden with the identity (straight-through gradient estimator).
Args | |
---|---|
bottleneck
|
tf.Tensor containing the data to be quantized.
|
Returns | |
---|---|
A tf.Tensor containing the quantized values.
|
with_name_scope
@classmethod
with_name_scope( method )
Decorator to automatically enter the module name scope.
class MyModule(tf.Module):
@tf.Module.with_name_scope
def __call__(self, x):
if not hasattr(self, 'w'):
self.w = tf.Variable(tf.random.normal([x.shape[1], 3]))
return tf.matmul(x, self.w)
Using the above module would produce tf.Variable
s and tf.Tensor
s whose
names included the module name:
mod = MyModule()
mod(tf.ones([1, 2]))
<tf.Tensor: shape=(1, 3), dtype=float32, numpy=..., dtype=float32)>
mod.w
<tf.Variable 'my_module/Variable:0' shape=(2, 3) dtype=float32,
numpy=..., dtype=float32)>
Args | |
---|---|
method
|
The method to wrap. |
Returns | |
---|---|
The original method wrapped such that it enters the module's name scope. |
__call__
__call__(
bottleneck
)
Perturbs a tensor with (quantization) noise and computes penalty.
Args | |
---|---|
bottleneck
|
tf.Tensor containing the data to be compressed. Must have at
least self.coding_rank dimensions.
|
Returns | |
---|---|
A tuple (self.quantize(bottleneck), self.penalty(bottleneck)) .
|