View source on GitHub
  
 | 
Optimization parameters for Ftrl with TPU embeddings.
tf.compat.v1.tpu.experimental.FtrlParameters(
    learning_rate: float,
    learning_rate_power: float = -0.5,
    initial_accumulator_value: float = 0.1,
    l1_regularization_strength: float = 0.0,
    l2_regularization_strength: float = 0.0,
    use_gradient_accumulation: bool = True,
    clip_weight_min: Optional[float] = None,
    clip_weight_max: Optional[float] = None,
    weight_decay_factor: Optional[float] = None,
    multiply_weight_decay_factor_by_learning_rate: Optional[bool] = None,
    multiply_linear_by_learning_rate: bool = False,
    beta: float = 0,
    allow_zero_accumulator: bool = False,
    clip_gradient_min: Optional[float] = None,
    clip_gradient_max: Optional[float] = None
)
Pass this to tf.estimator.tpu.experimental.EmbeddingConfigSpec via the
optimization_parameters argument to set the optimizer and its parameters.
See the documentation for tf.estimator.tpu.experimental.EmbeddingConfigSpec
for more details.
estimator = tf.estimator.tpu.TPUEstimator(
    ...
    embedding_config_spec=tf.estimator.tpu.experimental.EmbeddingConfigSpec(
        ...
        optimization_parameters=tf.tpu.experimental.FtrlParameters(0.1),
        ...))
Args | |
|---|---|
learning_rate
 | 
a floating point value. The learning rate. | 
learning_rate_power
 | 
A float value, must be less or equal to zero. Controls how the learning rate decreases during training. Use zero for a fixed learning rate. See section 3.1 in the paper. | 
initial_accumulator_value
 | 
The starting value for accumulators. Only zero or positive values are allowed. | 
l1_regularization_strength
 | 
A float value, must be greater than or equal to zero. | 
l2_regularization_strength
 | 
A float value, must be greater than or equal to zero. | 
use_gradient_accumulation
 | 
setting this to False makes embedding
gradients calculation less accurate but faster. Please see
optimization_parameters.proto for details. for details.
 | 
clip_weight_min
 | 
the minimum value to clip by; None means -infinity. | 
clip_weight_max
 | 
the maximum value to clip by; None means +infinity. | 
weight_decay_factor
 | 
amount of weight decay to apply; None means that the weights are not decayed. | 
multiply_weight_decay_factor_by_learning_rate
 | 
if true,
weight_decay_factor is multiplied by the current learning rate.
 | 
multiply_linear_by_learning_rate
 | 
When true, multiplies the usages of the linear slot in the weight update by the learning rate. This is useful when ramping up learning rate from 0 (which would normally produce NaNs). | 
beta
 | 
The beta parameter for FTRL. | 
allow_zero_accumulator
 | 
Changes the implementation of the square root to allow for the case of initial_accumulator_value being zero. This will cause a slight performance drop. | 
clip_gradient_min
 | 
the minimum value to clip by; None means -infinity. Gradient accumulation must be set to true if this is set. | 
clip_gradient_max
 | 
the maximum value to clip by; None means +infinity. Gradient accumulation must be set to true if this is set. | 
    View source on GitHub