tf.contrib.opt.MultitaskOptimizerWrapper
Stay organized with collections
Save and categorize content based on your preferences.
Optimizer wrapper making all-zero gradients harmless.
tf.contrib.opt.MultitaskOptimizerWrapper(
opt
)
This might be useful when a multi-task loss is used,
and some components of the loss might be
not present (e.g. masked out) in some training batches.
Technically their gradient would be zero,
which would normally affect the optimizer state
(e.g. push running average to zero).
However this is not the desired behaviour,
since the missing loss component
should be treated as unknown rather than zero.
This wrapper filters out all-zero gradient tensors,
therefore preserving the optimizer state.
If gradient clipping by global norm is used,
the provided function clip_gradients_by_global_norm
should be used (and specified explicitly by the user).
Otherwise the global norm would be underestimated
because of all-zero tensors that should be ignored.
The gradient calculation and application
are delegated to an underlying optimizer.
The gradient application is altered only for all-zero tensors.
Example:
momentum_optimizer = tf.compat.v1.train.MomentumOptimizer(
learning_rate, momentum=0.9)
multitask_momentum_optimizer = tf.contrib.opt.MultitaskOptimizerWrapper(
momentum_optimizer)
gradvars = multitask_momentum_optimizer.compute_gradients(
loss)
gradvars_clipped, _ = tf.contrib.opt.clip_gradients_by_global_norm(
gradvars, 15.0)
train_op = multitask_momentum_optimizer.apply_gradients(
gradvars_clipped, global_step=batch)
Args |
opt
|
an instance of a class that implements tf.train.Optimizer.
|
Except as otherwise noted, the content of this page is licensed under the Creative Commons Attribution 4.0 License, and code samples are licensed under the Apache 2.0 License. For details, see the Google Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.
Last updated 2020-10-01 UTC.
[[["Easy to understand","easyToUnderstand","thumb-up"],["Solved my problem","solvedMyProblem","thumb-up"],["Other","otherUp","thumb-up"]],[["Missing the information I need","missingTheInformationINeed","thumb-down"],["Too complicated / too many steps","tooComplicatedTooManySteps","thumb-down"],["Out of date","outOfDate","thumb-down"],["Samples / code issue","samplesCodeIssue","thumb-down"],["Other","otherDown","thumb-down"]],["Last updated 2020-10-01 UTC."],[],[],null,["# tf.contrib.opt.MultitaskOptimizerWrapper\n\n\u003cbr /\u003e\n\n|---------------------------------------------------------------------------------------------------------------------------------------------------------------|\n| [View source on GitHub](https://github.com/tensorflow/tensorflow/blob/v1.15.0/tensorflow/contrib/opt/python/training/multitask_optimizer_wrapper.py#L53-L112) |\n\nOptimizer wrapper making all-zero gradients harmless. \n\n tf.contrib.opt.MultitaskOptimizerWrapper(\n opt\n )\n\nThis might be useful when a multi-task loss is used,\nand some components of the loss might be\nnot present (e.g. masked out) in some training batches.\nTechnically their gradient would be zero,\nwhich would normally affect the optimizer state\n(e.g. push running average to zero).\nHowever this is not the desired behaviour,\nsince the missing loss component\nshould be treated as unknown rather than zero.\n\nThis wrapper filters out all-zero gradient tensors,\ntherefore preserving the optimizer state.\n\nIf gradient clipping by global norm is used,\nthe provided function clip_gradients_by_global_norm\nshould be used (and specified explicitly by the user).\nOtherwise the global norm would be underestimated\nbecause of all-zero tensors that should be ignored.\n\nThe gradient calculation and application\nare delegated to an underlying optimizer.\nThe gradient application is altered only for all-zero tensors.\n\n#### Example:\n\n momentum_optimizer = tf.compat.v1.train.MomentumOptimizer(\n learning_rate, momentum=0.9)\n multitask_momentum_optimizer = tf.contrib.opt.MultitaskOptimizerWrapper(\n momentum_optimizer)\n gradvars = multitask_momentum_optimizer.compute_gradients(\n loss)\n gradvars_clipped, _ = tf.contrib.opt.clip_gradients_by_global_norm(\n gradvars, 15.0)\n train_op = multitask_momentum_optimizer.apply_gradients(\n gradvars_clipped, global_step=batch)\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n| Args ---- ||\n|-------|------------------------------------------------------------|\n| `opt` | an instance of a class that implements tf.train.Optimizer. |\n\n\u003cbr /\u003e"]]