Note that in dense implementation of this algorithm, ms and mom will
update even if the grad is zero, but in this sparse implementation, ms
and mom will not update in iterations during which the grad is zero.
ms <- rho * ms{t-1} + (1-rho) * grad * grad
mom <- momentum * mom{t-1} + lr * grad / sqrt(ms + epsilon)
var <- var - mom
Args
var
A mutable Tensor. Must be one of the following types: float32, float64, int32, uint8, int16, int8, complex64, int64, qint8, quint8, qint32, bfloat16, uint16, complex128, half, uint32, uint64.
Should be from a Variable().
ms
A mutable Tensor. Must have the same type as var.
Should be from a Variable().
mom
A mutable Tensor. Must have the same type as var.
Should be from a Variable().
lr
A Tensor. Must have the same type as var.
Scaling factor. Must be a scalar.
rho
A Tensor. Must have the same type as var.
Decay rate. Must be a scalar.
momentum
A Tensor. Must have the same type as var.
epsilon
A Tensor. Must have the same type as var.
Ridge term. Must be a scalar.
grad
A Tensor. Must have the same type as var. The gradient.
use_locking
An optional bool. Defaults to False.
If True, updating of the var, ms, and mom tensors is protected
by a lock; otherwise the behavior is undefined, but may exhibit less
contention.
[[["Easy to understand","easyToUnderstand","thumb-up"],["Solved my problem","solvedMyProblem","thumb-up"],["Other","otherUp","thumb-up"]],[["Missing the information I need","missingTheInformationINeed","thumb-down"],["Too complicated / too many steps","tooComplicatedTooManySteps","thumb-down"],["Out of date","outOfDate","thumb-down"],["Samples / code issue","samplesCodeIssue","thumb-down"],["Other","otherDown","thumb-down"]],["Last updated 2022-10-27 UTC."],[],[]]