public class AMSGrad<Model: Differentiable & KeyPathIterable>: Optimizer
where
Model.TangentVector: VectorProtocol & PointwiseMultiplicative & ElementaryFunctions
& KeyPathIterable,
Model.TangentVector.VectorSpaceScalar == Float
AMSGrad optimizer.
This algorithm is a modification of Adam with better convergence properties when close to local optima.
Reference: “On the Convergence of Adam and Beyond”
-
Declaration
public typealias Model = Model
-
The learning rate.
Declaration
public var learningRate: Float
-
A coefficient used to calculate the first and second moments of the gradients.
Declaration
public var beta1: Float
-
A coefficient used to calculate the first and second moments of the gradients.
Declaration
public var beta2: Float
-
A small scalar added to the denominator to improve numerical stability.
Declaration
public var epsilon: Float
-
The learning rate decay.
Declaration
public var decay: Float
-
The current step.
Declaration
public var step: Int
-
The first moments of the weights.
Declaration
public var firstMoments: Model.TangentVector
-
The second moments of the weights.
Declaration
public var secondMoments: Model.TangentVector
-
The maximum of the second moments of the weights.
Declaration
public var secondMomentsMax: Model.TangentVector
-
Declaration
public init( for model: __shared Model, learningRate: Float = 1e-3, beta1: Float = 0.9, beta2: Float = 0.999, epsilon: Float = 1e-8, decay: Float = 0 )
-
Declaration
public required init(copying other: AMSGrad, to device: Device)