tf.contrib.seq2seq.LuongAttention
Stay organized with collections
Save and categorize content based on your preferences.
Implements Luong-style (multiplicative) attention scoring.
tf.contrib.seq2seq.LuongAttention(
num_units, memory, memory_sequence_length=None, scale=False,
probability_fn=None, score_mask_value=None, dtype=None,
custom_key_value_fn=None, name='LuongAttention'
)
This attention has two forms. The first is standard Luong attention,
as described in:
Minh-Thang Luong, Hieu Pham, Christopher D. Manning.
Effective Approaches to Attention-based Neural Machine Translation.
EMNLP 2015.
The second is the scaled form inspired partly by the normalized form of
Bahdanau attention.
To enable the second form, construct the object with parameter
scale=True
.
Args |
num_units
|
The depth of the attention mechanism.
|
memory
|
The memory to query; usually the output of an RNN encoder. This
tensor should be shaped [batch_size, max_time, ...] .
|
memory_sequence_length
|
(optional) Sequence lengths for the batch entries
in memory. If provided, the memory tensor rows are masked with zeros
for values past the respective sequence lengths.
|
scale
|
Python boolean. Whether to scale the energy term.
|
probability_fn
|
(optional) A callable . Converts the score to
probabilities. The default is tf.nn.softmax . Other options include
tf.contrib.seq2seq.hardmax and tf.contrib.sparsemax.sparsemax .
Its signature should be: probabilities = probability_fn(score) .
|
score_mask_value
|
(optional) The mask value for score before passing into
probability_fn . The default is -inf. Only used if
memory_sequence_length is not None.
|
dtype
|
The data type for the memory layer of the attention mechanism.
|
custom_key_value_fn
|
(optional): The custom function for
computing keys and values.
|
name
|
Name to use when creating ops.
|
Attributes |
alignments_size
|
|
batch_size
|
|
keys
|
|
memory_layer
|
|
query_layer
|
|
state_size
|
|
values
|
|
Methods
initial_alignments
View source
initial_alignments(
batch_size, dtype
)
Creates the initial alignment values for the AttentionWrapper
class.
This is important for AttentionMechanisms that use the previous alignment
to calculate the alignment at the next time step (e.g. monotonic attention).
The default behavior is to return a tensor of all zeros.
Args |
batch_size
|
int32 scalar, the batch_size.
|
dtype
|
The dtype .
|
Returns |
A dtype tensor shaped [batch_size, alignments_size]
(alignments_size is the values' max_time ).
|
initial_state
View source
initial_state(
batch_size, dtype
)
Creates the initial state values for the AttentionWrapper
class.
This is important for AttentionMechanisms that use the previous alignment
to calculate the alignment at the next time step (e.g. monotonic attention).
The default behavior is to return the same output as initial_alignments.
Args |
batch_size
|
int32 scalar, the batch_size.
|
dtype
|
The dtype .
|
Returns |
A structure of all-zero tensors with shapes as described by state_size .
|
__call__
View source
__call__(
query, state
)
Score the query based on the keys and values.
Args |
query
|
Tensor of dtype matching self.values and shape [batch_size,
query_depth] .
|
state
|
Tensor of dtype matching self.values and shape [batch_size,
alignments_size] (alignments_size is memory's max_time ).
|
Returns |
alignments
|
Tensor of dtype matching self.values and shape
[batch_size, alignments_size] (alignments_size is memory's
max_time ).
|
Except as otherwise noted, the content of this page is licensed under the Creative Commons Attribution 4.0 License, and code samples are licensed under the Apache 2.0 License. For details, see the Google Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.
Last updated 2020-10-01 UTC.
[[["Easy to understand","easyToUnderstand","thumb-up"],["Solved my problem","solvedMyProblem","thumb-up"],["Other","otherUp","thumb-up"]],[["Missing the information I need","missingTheInformationINeed","thumb-down"],["Too complicated / too many steps","tooComplicatedTooManySteps","thumb-down"],["Out of date","outOfDate","thumb-down"],["Samples / code issue","samplesCodeIssue","thumb-down"],["Other","otherDown","thumb-down"]],["Last updated 2020-10-01 UTC."],[],[],null,["# tf.contrib.seq2seq.LuongAttention\n\n\u003cbr /\u003e\n\n|-----------------------------------------------------------------------------------------------------------------------------------------------------|\n| [View source on GitHub](https://github.com/tensorflow/tensorflow/blob/v1.15.0/tensorflow/contrib/seq2seq/python/ops/attention_wrapper.py#L655-L750) |\n\nImplements Luong-style (multiplicative) attention scoring. \n\n tf.contrib.seq2seq.LuongAttention(\n num_units, memory, memory_sequence_length=None, scale=False,\n probability_fn=None, score_mask_value=None, dtype=None,\n custom_key_value_fn=None, name='LuongAttention'\n )\n\nThis attention has two forms. The first is standard Luong attention,\nas described in:\n\nMinh-Thang Luong, Hieu Pham, Christopher D. Manning.\n[Effective Approaches to Attention-based Neural Machine Translation.\nEMNLP 2015.](https://arxiv.org/abs/1508.04025)\n\nThe second is the scaled form inspired partly by the normalized form of\nBahdanau attention.\n\nTo enable the second form, construct the object with parameter\n`scale=True`.\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n| Args ---- ||\n|--------------------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|\n| `num_units` | The depth of the attention mechanism. |\n| `memory` | The memory to query; usually the output of an RNN encoder. This tensor should be shaped `[batch_size, max_time, ...]`. |\n| `memory_sequence_length` | (optional) Sequence lengths for the batch entries in memory. If provided, the memory tensor rows are masked with zeros for values past the respective sequence lengths. |\n| `scale` | Python boolean. Whether to scale the energy term. |\n| `probability_fn` | (optional) A `callable`. Converts the score to probabilities. The default is [`tf.nn.softmax`](../../../tf/nn/softmax). Other options include [`tf.contrib.seq2seq.hardmax`](../../../tf/contrib/seq2seq/hardmax) and [`tf.contrib.sparsemax.sparsemax`](../../../tf/contrib/sparsemax/sparsemax). Its signature should be: `probabilities = probability_fn(score)`. |\n| `score_mask_value` | (optional) The mask value for score before passing into `probability_fn`. The default is -inf. Only used if `memory_sequence_length` is not None. |\n| `dtype` | The data type for the memory layer of the attention mechanism. |\n| `custom_key_value_fn` | (optional): The custom function for computing keys and values. |\n| `name` | Name to use when creating ops. |\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n| Attributes ---------- ||\n|-------------------|---------------|\n| `alignments_size` | \u003cbr /\u003e \u003cbr /\u003e |\n| `batch_size` | \u003cbr /\u003e \u003cbr /\u003e |\n| `keys` | \u003cbr /\u003e \u003cbr /\u003e |\n| `memory_layer` | \u003cbr /\u003e \u003cbr /\u003e |\n| `query_layer` | \u003cbr /\u003e \u003cbr /\u003e |\n| `state_size` | \u003cbr /\u003e \u003cbr /\u003e |\n| `values` | \u003cbr /\u003e \u003cbr /\u003e |\n\n\u003cbr /\u003e\n\nMethods\n-------\n\n### `initial_alignments`\n\n[View source](https://github.com/tensorflow/tensorflow/blob/v1.15.0/tensorflow/contrib/seq2seq/python/ops/attention_wrapper.py#L191-L208) \n\n initial_alignments(\n batch_size, dtype\n )\n\nCreates the initial alignment values for the `AttentionWrapper` class.\n\nThis is important for AttentionMechanisms that use the previous alignment\nto calculate the alignment at the next time step (e.g. monotonic attention).\n\nThe default behavior is to return a tensor of all zeros.\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n| Args ||\n|--------------|---------------------------------|\n| `batch_size` | `int32` scalar, the batch_size. |\n| `dtype` | The `dtype`. |\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n| Returns ||\n|---|---|\n| A `dtype` tensor shaped `[batch_size, alignments_size]` (`alignments_size` is the values' `max_time`). ||\n\n\u003cbr /\u003e\n\n### `initial_state`\n\n[View source](https://github.com/tensorflow/tensorflow/blob/v1.15.0/tensorflow/contrib/seq2seq/python/ops/attention_wrapper.py#L210-L225) \n\n initial_state(\n batch_size, dtype\n )\n\nCreates the initial state values for the `AttentionWrapper` class.\n\nThis is important for AttentionMechanisms that use the previous alignment\nto calculate the alignment at the next time step (e.g. monotonic attention).\n\nThe default behavior is to return the same output as initial_alignments.\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n| Args ||\n|--------------|---------------------------------|\n| `batch_size` | `int32` scalar, the batch_size. |\n| `dtype` | The `dtype`. |\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n| Returns ||\n|---|---|\n| A structure of all-zero tensors with shapes as described by `state_size`. ||\n\n\u003cbr /\u003e\n\n### `__call__`\n\n[View source](https://github.com/tensorflow/tensorflow/blob/v1.15.0/tensorflow/contrib/seq2seq/python/ops/attention_wrapper.py#L725-L750) \n\n __call__(\n query, state\n )\n\nScore the query based on the keys and values.\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n| Args ||\n|---------|------------------------------------------------------------------------------------------------------------------------------|\n| `query` | Tensor of dtype matching `self.values` and shape `[batch_size, query_depth]`. |\n| `state` | Tensor of dtype matching `self.values` and shape `[batch_size, alignments_size]` (`alignments_size` is memory's `max_time`). |\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n| Returns ||\n|--------------|------------------------------------------------------------------------------------------------------------------------------|\n| `alignments` | Tensor of dtype matching `self.values` and shape `[batch_size, alignments_size]` (`alignments_size` is memory's `max_time`). |\n\n\u003cbr /\u003e"]]