|  View source on GitHub | 
Implements Bahdanau-style (additive) attention.
tf.contrib.seq2seq.BahdanauAttention(
    num_units, memory, memory_sequence_length=None, normalize=False,
    probability_fn=None, score_mask_value=None, dtype=None,
    custom_key_value_fn=None, name='BahdanauAttention'
)
This attention has two forms. The first is Bahdanau attention, as described in:
Dzmitry Bahdanau, Kyunghyun Cho, Yoshua Bengio. "Neural Machine Translation by Jointly Learning to Align and Translate." ICLR 2015. https://arxiv.org/abs/1409.0473
The second is the normalized form. This form is inspired by the weight normalization article:
Tim Salimans, Diederik P. Kingma. "Weight Normalization: A Simple Reparameterization to Accelerate Training of Deep Neural Networks." https://arxiv.org/abs/1602.07868
To enable the second form, construct the object with parameter
normalize=True.
| Args | |
|---|---|
| num_units | The depth of the query mechanism. | 
| memory | The memory to query; usually the output of an RNN encoder.  This
tensor should be shaped [batch_size, max_time, ...]. | 
| memory_sequence_length | (optional) Sequence lengths for the batch entries in memory. If provided, the memory tensor rows are masked with zeros for values past the respective sequence lengths. | 
| normalize | Python boolean. Whether to normalize the energy term. | 
| probability_fn | (optional) A callable.  Converts the score to
probabilities.  The default istf.nn.softmax. Other options includetf.contrib.seq2seq.hardmaxandtf.contrib.sparsemax.sparsemax.
Its signature should be:probabilities = probability_fn(score). | 
| score_mask_value | (optional): The mask value for score before passing into probability_fn. The default is -inf. Only used ifmemory_sequence_lengthis not None. | 
| dtype | The data type for the query and memory layers of the attention mechanism. | 
| custom_key_value_fn | (optional): The custom function for computing keys and values. | 
| name | Name to use when creating ops. | 
| Attributes | |
|---|---|
| alignments_size | |
| batch_size | |
| keys | |
| memory_layer | |
| query_layer | |
| state_size | |
| values | |
Methods
initial_alignments
initial_alignments(
    batch_size, dtype
)
Creates the initial alignment values for the AttentionWrapper class.
This is important for AttentionMechanisms that use the previous alignment to calculate the alignment at the next time step (e.g. monotonic attention).
The default behavior is to return a tensor of all zeros.
| Args | |
|---|---|
| batch_size | int32scalar, the batch_size. | 
| dtype | The dtype. | 
| Returns | |
|---|---|
| A dtypetensor shaped[batch_size, alignments_size](alignments_sizeis the values'max_time). | 
initial_state
initial_state(
    batch_size, dtype
)
Creates the initial state values for the AttentionWrapper class.
This is important for AttentionMechanisms that use the previous alignment to calculate the alignment at the next time step (e.g. monotonic attention).
The default behavior is to return the same output as initial_alignments.
| Args | |
|---|---|
| batch_size | int32scalar, the batch_size. | 
| dtype | The dtype. | 
| Returns | |
|---|---|
| A structure of all-zero tensors with shapes as described by state_size. | 
__call__
__call__(
    query, state
)
Score the query based on the keys and values.
| Args | |
|---|---|
| query | Tensor of dtype matching self.valuesand shape[batch_size,
query_depth]. | 
| state | Tensor of dtype matching self.valuesand shape[batch_size,
alignments_size](alignments_sizeis memory'smax_time). | 
| Returns | |
|---|---|
| alignments | Tensor of dtype matching self.valuesand shape[batch_size, alignments_size](alignments_sizeis memory'smax_time). |