tfp.optimizer.proximal_hessian_sparse_one_step
Stay organized with collections
Save and categorize content based on your preferences.
One step of (the outer loop of) the minimization algorithm.
tfp.optimizer.proximal_hessian_sparse_one_step(
gradient_unregularized_loss,
hessian_unregularized_loss_outer,
hessian_unregularized_loss_middle,
x_start,
tolerance,
l1_regularizer,
l2_regularizer=None,
maximum_full_sweeps=1,
learning_rate=None,
name=None
)
This function returns a new value of x
, equal to x_start + x_update
. The
increment x_update in R^n
is computed by a coordinate descent method, that
is, by a loop in which each iteration updates exactly one coordinate of
x_update
. (Some updates may leave the value of the coordinate unchanged.)
The particular update method used is to apply an L1-based proximity operator,
"soft threshold", whose fixed point x_update_fix
is the desired minimum
x_update_fix = argmin{
Loss(x_start + x_update')
+ l1_regularizer * ||x_start + x_update'||_1
+ l2_regularizer * ||x_start + x_update'||_2**2
: x_update' }
where in each iteration x_update'
is constrained to have at most one nonzero
coordinate.
This update method preserves sparsity, i.e., tends to find sparse solutions if
x_start
is sparse. Additionally, the choice of step size is based on
curvature (Hessian), which significantly speeds up convergence.
This algorithm assumes that Loss
is convex, at least in a region surrounding
the optimum. (If l2_regularizer > 0
, then only weak convexity is needed.)
Args |
gradient_unregularized_loss
|
(Batch of) Tensor with the same shape and
dtype as x_start representing the gradient, evaluated at x_start , of
the unregularized loss function (denoted Loss above). (In all current
use cases, Loss is the negative log likelihood.)
|
hessian_unregularized_loss_outer
|
(Batch of) Tensor or SparseTensor
having the same dtype as x_start , and shape [N, n] where x_start has
shape [n] , satisfying the property
Transpose(hessian_unregularized_loss_outer)
@ diag(hessian_unregularized_loss_middle)
@ hessian_unregularized_loss_inner
= (approximation of) Hessian matrix of Loss, evaluated at x_start .
|
hessian_unregularized_loss_middle
|
(Batch of) vector-shaped Tensor having
the same dtype as x_start , and shape [N] where
hessian_unregularized_loss_outer has shape [N, n] , satisfying the
property
Transpose(hessian_unregularized_loss_outer)
@ diag(hessian_unregularized_loss_middle)
@ hessian_unregularized_loss_inner
= (approximation of) Hessian matrix of Loss, evaluated at x_start .
|
x_start
|
(Batch of) vector-shaped, float Tensor representing the current
value of the argument to the Loss function.
|
tolerance
|
scalar, float Tensor representing the convergence threshold.
The optimization step will terminate early, returning its current value of
x_start + x_update , once the following condition is met:
||x_update_end - x_update_start||_2 / (1 + ||x_start||_2)
< sqrt(tolerance) ,
where x_update_end is the value of x_update at the end of a sweep and
x_update_start is the value of x_update at the beginning of that
sweep.
|
l1_regularizer
|
scalar, float Tensor representing the weight of the L1
regularization term (see equation above). If L1 regularization is not
required, then tfp.glm.fit_one_step is preferable.
|
l2_regularizer
|
scalar, float Tensor representing the weight of the L2
regularization term (see equation above).
Default value: None (i.e., no L2 regularization).
|
maximum_full_sweeps
|
Python integer specifying maximum number of sweeps to
run. A "sweep" consists of an iteration of coordinate descent on each
coordinate. After this many sweeps, the algorithm will terminate even if
convergence has not been reached.
Default value: 1 .
|
learning_rate
|
scalar, float Tensor representing a multiplicative factor
used to dampen the proximal gradient descent steps.
Default value: None (i.e., factor is conceptually 1 ).
|
name
|
Python string representing the name of the TensorFlow operation.
The default name is "minimize_one_step" .
|
Returns |
x
|
(Batch of) Tensor having the same shape and dtype as x_start ,
representing the updated value of x , that is, x_start + x_update .
|
is_converged
|
scalar, bool Tensor indicating whether convergence
occurred across all batches within the specified number of sweeps.
|
iter
|
scalar, int Tensor representing the actual number of coordinate
updates made (before achieving convergence). Since each sweep consists of
tf.size(x_start) iterations, the maximum number of updates is
maximum_full_sweeps * tf.size(x_start) .
|
References
[1]: Jerome Friedman, Trevor Hastie and Rob Tibshirani. Regularization Paths
for Generalized Linear Models via Coordinate Descent. Journal of
Statistical Software, 33(1), 2010.
https://www.jstatsoft.org/article/view/v033i01/v33i01.pdf
[2]: Guo-Xun Yuan, Chia-Hua Ho and Chih-Jen Lin. An Improved GLMNET for
L1-regularized Logistic Regression. Journal of Machine Learning
Research, 13, 2012.
http://www.jmlr.org/papers/volume13/yuan12a/yuan12a.pdf
Except as otherwise noted, the content of this page is licensed under the Creative Commons Attribution 4.0 License, and code samples are licensed under the Apache 2.0 License. For details, see the Google Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.
Last updated 2023-11-21 UTC.
[[["Easy to understand","easyToUnderstand","thumb-up"],["Solved my problem","solvedMyProblem","thumb-up"],["Other","otherUp","thumb-up"]],[["Missing the information I need","missingTheInformationINeed","thumb-down"],["Too complicated / too many steps","tooComplicatedTooManySteps","thumb-down"],["Out of date","outOfDate","thumb-down"],["Samples / code issue","samplesCodeIssue","thumb-down"],["Other","otherDown","thumb-down"]],["Last updated 2023-11-21 UTC."],[],[],null,["# tfp.optimizer.proximal_hessian_sparse_one_step\n\n\u003cbr /\u003e\n\n|--------------------------------------------------------------------------------------------------------------------------------------------------------------|\n| [View source on GitHub](https://github.com/tensorflow/probability/blob/v0.23.0/tensorflow_probability/python/optimizer/proximal_hessian_sparse.py#L117-L458) |\n\nOne step of (the outer loop of) the minimization algorithm. \n\n tfp.optimizer.proximal_hessian_sparse_one_step(\n gradient_unregularized_loss,\n hessian_unregularized_loss_outer,\n hessian_unregularized_loss_middle,\n x_start,\n tolerance,\n l1_regularizer,\n l2_regularizer=None,\n maximum_full_sweeps=1,\n learning_rate=None,\n name=None\n )\n\nThis function returns a new value of `x`, equal to `x_start + x_update`. The\nincrement `x_update in R^n` is computed by a coordinate descent method, that\nis, by a loop in which each iteration updates exactly one coordinate of\n`x_update`. (Some updates may leave the value of the coordinate unchanged.)\n\nThe particular update method used is to apply an L1-based proximity operator,\n\"soft threshold\", whose fixed point `x_update_fix` is the desired minimum \n\n x_update_fix = argmin{\n Loss(x_start + x_update')\n + l1_regularizer * ||x_start + x_update'||_1\n + l2_regularizer * ||x_start + x_update'||_2**2\n : x_update' }\n\nwhere in each iteration `x_update'` is constrained to have at most one nonzero\ncoordinate.\n\nThis update method preserves sparsity, i.e., tends to find sparse solutions if\n`x_start` is sparse. Additionally, the choice of step size is based on\ncurvature (Hessian), which significantly speeds up convergence.\n\nThis algorithm assumes that `Loss` is convex, at least in a region surrounding\nthe optimum. (If `l2_regularizer \u003e 0`, then only weak convexity is needed.)\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n| Args ---- ||\n|-------------------------------------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|\n| `gradient_unregularized_loss` | (Batch of) `Tensor` with the same shape and dtype as `x_start` representing the gradient, evaluated at `x_start`, of the unregularized loss function (denoted `Loss` above). (In all current use cases, `Loss` is the negative log likelihood.) |\n| `hessian_unregularized_loss_outer` | (Batch of) `Tensor` or `SparseTensor` having the same dtype as `x_start`, and shape `[N, n]` where `x_start` has shape `[n]`, satisfying the property `Transpose(hessian_unregularized_loss_outer) @ diag(hessian_unregularized_loss_middle) @ hessian_unregularized_loss_inner = (approximation of) Hessian matrix of Loss, evaluated at x_start`. |\n| `hessian_unregularized_loss_middle` | (Batch of) vector-shaped `Tensor` having the same dtype as `x_start`, and shape `[N]` where `hessian_unregularized_loss_outer` has shape `[N, n]`, satisfying the property `Transpose(hessian_unregularized_loss_outer) @ diag(hessian_unregularized_loss_middle) @ hessian_unregularized_loss_inner = (approximation of) Hessian matrix of Loss, evaluated at x_start`. |\n| `x_start` | (Batch of) vector-shaped, `float` `Tensor` representing the current value of the argument to the Loss function. |\n| `tolerance` | scalar, `float` `Tensor` representing the convergence threshold. The optimization step will terminate early, returning its current value of `x_start + x_update`, once the following condition is met: `||x_update_end - x_update_start||_2 / (1 + ||x_start||_2) \u003c sqrt(tolerance)`, where `x_update_end` is the value of `x_update` at the end of a sweep and `x_update_start` is the value of `x_update` at the beginning of that sweep. |\n| `l1_regularizer` | scalar, `float` `Tensor` representing the weight of the L1 regularization term (see equation above). If L1 regularization is not required, then [`tfp.glm.fit_one_step`](../../tfp/glm/fit_one_step) is preferable. |\n| `l2_regularizer` | scalar, `float` `Tensor` representing the weight of the L2 regularization term (see equation above). Default value: `None` (i.e., no L2 regularization). |\n| `maximum_full_sweeps` | Python integer specifying maximum number of sweeps to run. A \"sweep\" consists of an iteration of coordinate descent on each coordinate. After this many sweeps, the algorithm will terminate even if convergence has not been reached. Default value: `1`. |\n| `learning_rate` | scalar, `float` `Tensor` representing a multiplicative factor used to dampen the proximal gradient descent steps. Default value: `None` (i.e., factor is conceptually `1`). |\n| `name` | Python string representing the name of the TensorFlow operation. The default name is `\"minimize_one_step\"`. |\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n| Returns ------- ||\n|----------------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|\n| `x` | (Batch of) `Tensor` having the same shape and dtype as `x_start`, representing the updated value of `x`, that is, `x_start + x_update`. |\n| `is_converged` | scalar, `bool` `Tensor` indicating whether convergence occurred across all batches within the specified number of sweeps. |\n| `iter` | scalar, `int` `Tensor` representing the actual number of coordinate updates made (before achieving convergence). Since each sweep consists of [`tf.size(x_start)`](https://www.tensorflow.org/api_docs/python/tf/size) iterations, the maximum number of updates is `maximum_full_sweeps * tf.size(x_start)`. |\n\n\u003cbr /\u003e\n\n#### References\n\n\\[1\\]: Jerome Friedman, Trevor Hastie and Rob Tibshirani. Regularization Paths\nfor Generalized Linear Models via Coordinate Descent. *Journal of\nStatistical Software* , 33(1), 2010.\n\u003chttps://www.jstatsoft.org/article/view/v033i01/v33i01.pdf\u003e\n\n\\[2\\]: Guo-Xun Yuan, Chia-Hua Ho and Chih-Jen Lin. An Improved GLMNET for\nL1-regularized Logistic Regression. *Journal of Machine Learning\nResearch* , 13, 2012.\n\u003chttp://www.jmlr.org/papers/volume13/yuan12a/yuan12a.pdf\u003e"]]