public class AdaGrad<Model: Differentiable>: Optimizer
where
Model.TangentVector: VectorProtocol & PointwiseMultiplicative
& ElementaryFunctions & KeyPathIterable,
Model.TangentVector.VectorSpaceScalar == Float
An AdaGrad optimizer.
Implements the AdaGrad (adaptive gradient) optimization algorithm. AdaGrad has parameter-specific learning rates, which are adapted relative to how frequently parameters gets updated during training. Parameters that receive more updates have smaller learning rates.
AdaGrad individually adapts the learning rates of all model parameters by scaling them inversely proportional to the square root of the running sum of squares of gradient norms.
Reference: “Adaptive Subgradient Methods for Online Learning and Stochastic Optimization” (Duchi et al, 2011)
-
Declaration
public typealias Model = Model
-
The learning rate.
Declaration
public var learningRate: Float
-
A small scalar added to the denominator to improve numerical stability.
Declaration
public var epsilon: Float
-
The running sum of squares of gradient norms.
Declaration
public var accumulator: Model.TangentVector
-
Creates an instance for
model
.Declaration
public init( for model: __shared Model, learningRate: Float = 1e-3, initialAccumulatorValue: Float = 0.1, epsilon: Float = 1e-8 )
Parameters
learningRate
The learning rate. The default value is
1e-3
.initialAccumulatorValue
The starting value for the running sum of squares of gradient norms. The default value is
0.1
.epsilon
A small scalar added to the denominator to improve numerical stability. The default value is
1e-8
. -
Declaration
public required init(copying other: AdaGrad, to device: Device)