public class RAdam<Model: Differentiable>: Optimizer
where
Model.TangentVector: VectorProtocol & PointwiseMultiplicative & ElementaryFunctions
& KeyPathIterable,
Model.TangentVector.VectorSpaceScalar == Float
RAdam optimizer.
Rectified Adam, a variant of Adam that introduces a term to rectify the adaptive learning rate variance.
Reference: “On the Variance of the Adaptive Learning Rate and Beyond”
-
Declaration
public typealias Model = Model
-
The learning rate.
Declaration
public var learningRate: Float
-
A coefficient used to calculate the first and second moments of the gradients.
Declaration
public var beta1: Float
-
A coefficient used to calculate the first and second moments of the gradients.
Declaration
public var beta2: Float
-
A small scalar added to the denominator to improve numerical stability.
Declaration
public var epsilon: Float
-
The learning rate decay.
Declaration
public var decay: Float
-
The current step.
Declaration
public var step: Int
-
The first moments of the weights.
Declaration
public var firstMoments: Model.TangentVector
-
The second moments of the weights.
Declaration
public var secondMoments: Model.TangentVector
-
Declaration
public init( for model: __shared Model, learningRate: Float = 1e-3, beta1: Float = 0.9, beta2: Float = 0.999, epsilon: Float = 1e-8, decay: Float = 0 )
-
Declaration
public required init(copying other: RAdam, to device: Device)