"Layer Normalization"
Jimmy Lei Ba, Jamie Ryan Kiros, Geoffrey E. Hinton
and is applied before the internal nonlinearities.
Args
num_units
int, The number of units in the LSTM cell
use_peepholes
bool, set True to enable diagonal/peephole connections.
initializer
(optional) The initializer to use for the weight and
projection matrices.
num_proj
(optional) int, The output dimensionality for the projection
matrices. If None, no projection is performed.
proj_clip
(optional) A float value. If num_proj > 0 and proj_clip is
provided, then the projected values are clipped elementwise to within
[-proj_clip, proj_clip].
num_unit_shards
How to split the weight matrix. If >1, the weight
matrix is stored across num_unit_shards.
num_proj_shards
How to split the projection matrix. If >1, the
projection matrix is stored across num_proj_shards.
forget_bias
Biases of the forget gate are initialized by default to 1
in order to reduce the scale of forgetting at the beginning of
the training.
state_is_tuple
If True, accepted and returned states are 2-tuples of
the c_state and m_state. By default (False), they are concatenated
along the column axis. This default behavior will soon be deprecated.
activation
Activation function of the inner states.
reuse
(optional) Python boolean describing whether to reuse variables
in an existing scope. If not True, and the existing scope already has
the given variables, an error is raised.
layer_norm
If True, layer normalization will be applied.
norm_gain
float, The layer normalization gain initial value. If
layer_norm has been set to False, this argument will be ignored.
norm_shift
float, The layer normalization shift initial value. If
layer_norm has been set to False, this argument will be ignored.
Attributes
graph
DEPRECATED FUNCTION
output_size
Integer or TensorShape: size of outputs produced by this cell.
scope_name
state_size
size(s) of state(s) used by this cell.
It can be represented by an Integer, a TensorShape or a tuple of Integers
or TensorShapes.
int, float, or unit Tensor representing the batch size.
dtype
the data type to use for the state.
Returns
If state_size is an int or TensorShape, then the return value is a
N-D tensor of shape [batch_size, state_size] filled with zeros.
If state_size is a nested list or tuple, then the return value is
a nested list or tuple (of the same structure) of 2-D tensors with
the shapes [batch_size, s] for each s in state_size.
[[["Easy to understand","easyToUnderstand","thumb-up"],["Solved my problem","solvedMyProblem","thumb-up"],["Other","otherUp","thumb-up"]],[["Missing the information I need","missingTheInformationINeed","thumb-down"],["Too complicated / too many steps","tooComplicatedTooManySteps","thumb-down"],["Out of date","outOfDate","thumb-down"],["Samples / code issue","samplesCodeIssue","thumb-down"],["Other","otherDown","thumb-down"]],["Last updated 2020-10-01 UTC."],[],[],null,["# tf.contrib.rnn.CoupledInputForgetGateLSTMCell\n\n\u003cbr /\u003e\n\n|---------------------------------------------------------------------------------------------------------------------------------------|\n| [View source on GitHub](https://github.com/tensorflow/tensorflow/blob/v1.15.0/tensorflow/contrib/rnn/python/ops/rnn_cell.py#L98-L322) |\n\nLong short-term memory unit (LSTM) recurrent network cell.\n\nInherits From: [`RNNCell`](../../../tf/nn/rnn_cell/RNNCell) \n\n tf.contrib.rnn.CoupledInputForgetGateLSTMCell(\n num_units, use_peepholes=False, initializer=None, num_proj=None, proj_clip=None,\n num_unit_shards=1, num_proj_shards=1, forget_bias=1.0, state_is_tuple=True,\n activation=tf.math.tanh, reuse=None, layer_norm=False, norm_gain=1.0,\n norm_shift=0.0\n )\n\nThe default non-peephole implementation is based on:\n\n\u003chttps://pdfs.semanticscholar.org/1154/0131eae85b2e11d53df7f1360eeb6476e7f4.pdf\u003e\n\nFelix Gers, Jurgen Schmidhuber, and Fred Cummins.\n\"Learning to forget: Continual prediction with LSTM.\" IET, 850-855, 1999.\n\nThe peephole implementation is based on:\n\n\u003chttps://research.google.com/pubs/archive/43905.pdf\u003e\n\nHasim Sak, Andrew Senior, and Francoise Beaufays.\n\"Long short-term memory recurrent neural network architectures for\nlarge scale acoustic modeling.\" INTERSPEECH, 2014.\n\nThe coupling of input and forget gate is based on:\n\n\u003chttp://arxiv.org/pdf/1503.04069.pdf\u003e\n\nGreff et al. \"LSTM: A Search Space Odyssey\"\n\nThe class uses optional peep-hole connections, and an optional projection\nlayer.\nLayer normalization implementation is based on:\n\n\u003chttps://arxiv.org/abs/1607.06450\u003e\n\n\"Layer Normalization\"\nJimmy Lei Ba, Jamie Ryan Kiros, Geoffrey E. Hinton\n\nand is applied before the internal nonlinearities.\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n| Args ---- ||\n|-------------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|\n| `num_units` | int, The number of units in the LSTM cell |\n| `use_peepholes` | bool, set True to enable diagonal/peephole connections. |\n| `initializer` | (optional) The initializer to use for the weight and projection matrices. |\n| `num_proj` | (optional) int, The output dimensionality for the projection matrices. If None, no projection is performed. |\n| `proj_clip` | (optional) A float value. If `num_proj \u003e 0` and `proj_clip` is provided, then the projected values are clipped elementwise to within `[-proj_clip, proj_clip]`. |\n| `num_unit_shards` | How to split the weight matrix. If \\\u003e1, the weight matrix is stored across num_unit_shards. |\n| `num_proj_shards` | How to split the projection matrix. If \\\u003e1, the projection matrix is stored across num_proj_shards. |\n| `forget_bias` | Biases of the forget gate are initialized by default to 1 in order to reduce the scale of forgetting at the beginning of the training. |\n| `state_is_tuple` | If True, accepted and returned states are 2-tuples of the `c_state` and `m_state`. By default (False), they are concatenated along the column axis. This default behavior will soon be deprecated. |\n| `activation` | Activation function of the inner states. |\n| `reuse` | (optional) Python boolean describing whether to reuse variables in an existing scope. If not `True`, and the existing scope already has the given variables, an error is raised. |\n| `layer_norm` | If `True`, layer normalization will be applied. |\n| `norm_gain` | float, The layer normalization gain initial value. If `layer_norm` has been set to `False`, this argument will be ignored. |\n| `norm_shift` | float, The layer normalization shift initial value. If `layer_norm` has been set to `False`, this argument will be ignored. |\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n| Attributes ---------- ||\n|---------------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|\n| `graph` | DEPRECATED FUNCTION \u003cbr /\u003e | **Warning:** THIS FUNCTION IS DEPRECATED. It will be removed in a future version. Instructions for updating: Stop using this property because tf.layers layers no longer track their graph. |\n| `output_size` | Integer or TensorShape: size of outputs produced by this cell. |\n| `scope_name` | \u003cbr /\u003e |\n| `state_size` | size(s) of state(s) used by this cell. \u003cbr /\u003e It can be represented by an Integer, a TensorShape or a tuple of Integers or TensorShapes. |\n\n\u003cbr /\u003e\n\nMethods\n-------\n\n### `get_initial_state`\n\n[View source](https://github.com/tensorflow/tensorflow/blob/v1.15.0/tensorflow/python/ops/rnn_cell_impl.py#L281-L309) \n\n get_initial_state(\n inputs=None, batch_size=None, dtype=None\n )\n\n### `zero_state`\n\n[View source](https://github.com/tensorflow/tensorflow/blob/v1.15.0/tensorflow/python/ops/rnn_cell_impl.py#L311-L340) \n\n zero_state(\n batch_size, dtype\n )\n\nReturn zero-filled state tensor(s).\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n| Args ||\n|--------------|---------------------------------------------------------|\n| `batch_size` | int, float, or unit Tensor representing the batch size. |\n| `dtype` | the data type to use for the state. |\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n| Returns ||\n|---|---|\n| If `state_size` is an int or TensorShape, then the return value is a `N-D` tensor of shape `[batch_size, state_size]` filled with zeros. \u003cbr /\u003e If `state_size` is a nested list or tuple, then the return value is a nested list or tuple (of the same structure) of `2-D` tensors with the shapes `[batch_size, s]` for each s in `state_size`. ||\n\n\u003cbr /\u003e"]]