public class AdaDelta<Model: Differentiable>: Optimizer
where
Model.TangentVector: VectorProtocol & PointwiseMultiplicative
& ElementaryFunctions & KeyPathIterable,
Model.TangentVector.VectorSpaceScalar == Float
An AdaDelta optimizer.
Implements the AdaDelta optimization algorithm. AdaDelta is a stochastic gradient descent method based on the first order information. It adapts learning rates based on a moving window of gradient updates, instead of accumulating all past gradients. Thus, AdaDelta continues learning even when many updates have been done. It adapts faster to changing dynamics of the optimization problem space.
Reference: “ADADELTA: An Adaptive Learning Rate Method” (Zeiler, 2012)
-
Declaration
public typealias Model = Model
-
The learning rate.
Declaration
public var learningRate: Float
-
The decay factor, corresponding to the fraction of gradient to keep at each time step.
Declaration
public var rho: Float
-
A small scalar added to the denominator to improve numerical stability.
Declaration
public var epsilon: Float
-
The learning rate decay.
Declaration
public var decay: Float
-
The current step.
Declaration
public var step: Int
-
The accumulated, exponentially decaying average of squared gradients.
Declaration
public var averageSquared: Model.TangentVector
-
The accumulated parameter updates.
Declaration
public var accumulatedDelta: Model.TangentVector
-
Creates an instance for
model
.Declaration
public init( for model: __shared Model, learningRate: Float = 1, rho: Float = 0.95, epsilon: Float = 1e-6, decay: Float = 0 )
Parameters
learningRate
The learning rate. The default value is
1
.rho
The decay factor. The default value is
0.95
.epsilon
A small scalar added to the denominator to improve numerical stability. The default value is
1e-6
.decay
The learning rate decay. The defalut value is
0
. -
Declaration
public required init(copying other: AdaDelta, to device: Device)