View source on GitHub
|
Functional interface for the group normalization layer.
tf.contrib.layers.group_norm(
inputs, groups=32, channels_axis=-1, reduction_axes=(-3, -2), center=True,
scale=True, epsilon=1e-06, activation_fn=None, param_initializers=None,
reuse=None, variables_collections=None, outputs_collections=None,
trainable=True, scope=None, mean_close_to_zero=False
)
Reference: https://arxiv.org/abs/1803.08494
"Group Normalization", Yuxin Wu, Kaiming He
Args | |
|---|---|
inputs
|
A Tensor with at least 2 dimensions one which is channels. All shape dimensions except for batch must be fully defined. |
groups
|
Integer. Divide the channels into this number of groups over which
normalization statistics are computed. This number must be commensurate
with the number of channels in inputs.
|
channels_axis
|
An integer. Specifies index of channels axis which will be
broken into groups, each of which whose statistics will be computed
across. Must be mutually exclusive with reduction_axes. Preferred usage
is to specify negative integers to be agnostic as to whether a batch
dimension is included.
|
reduction_axes
|
Tuple of integers. Specifies dimensions over which
statistics will be accumulated. Must be mutually exclusive with
channels_axis. Statistics will not be accumulated across axes not
specified in reduction_axes nor channel_axis. Preferred usage is to
specify negative integers to be agnostic to whether a batch dimension is
included.
Some sample usage cases: NHWC format: channels_axis=-1, reduction_axes=[-3, -2] NCHW format: channels_axis=-3, reduction_axes=[-2, -1] |
center
|
If True, add offset of beta to normalized tensor. If False, beta
is ignored.
|
scale
|
If True, multiply by gamma. If False, gamma is
not used. When the next layer is linear (also e.g. nn.relu), this can be
disabled since the scaling can be done by the next layer.
|
epsilon
|
Small float added to variance to avoid dividing by zero. |
activation_fn
|
Activation function, default set to None to skip it and maintain a linear activation. |
param_initializers
|
Optional initializers for beta, gamma, moving mean and moving variance. |
reuse
|
Whether or not the layer and its variables should be reused. To be able to reuse the layer scope must be given. |
variables_collections
|
Optional collections for the variables. |
outputs_collections
|
Collections to add the outputs. |
trainable
|
If True also add variables to the graph collection
GraphKeys.TRAINABLE_VARIABLES (see tf.Variable).
|
scope
|
Optional scope for variable_scope.
|
mean_close_to_zero
|
The mean of input before ReLU will be close to zero
when batch size >= 4k for Resnet-50 on TPU. If True, use
nn.sufficient_statistics and nn.normalize_moments to calculate the
variance. This is the same behavior as fused equals True in batch
normalization. If False, use nn.moments to calculate the variance.
When mean is close to zero, like 1e-4, use mean to calculate the
variance may have poor result due to repeated roundoff error and
denormalization in mean. When mean is large, like 1e2,
sum(input^2) is so large that only the high-order digits of the elements
are being accumulated. Thus, use sum(input - mean)^2/n to calculate
the variance has better accuracy compared to (sum(input^2)/n - mean^2)
when mean is large.
|
Returns | |
|---|---|
A Tensor representing the output of the operation.
|
Raises | |
|---|---|
ValueError
|
If the rank of inputs is undefined.
|
ValueError
|
If rank or channels dimension of inputs is undefined.
|
ValueError
|
If number of groups is not commensurate with number of channels. |
ValueError
|
If reduction_axes or channels_axis are out of bounds. |
ValueError
|
If reduction_axes are not mutually exclusive with channels_axis. |
View source on GitHub