Magnitude-based weight pruning gradually zeroes out model weights during the
training process to achieve model sparsity. Sparse models are easier to
compress, and we can skip the zeroes during inference for latency improvements.
This technique brings improvements via model compression. In the future,
framework support for this technique will provide latency improvements. We've
seen up to 6x improvements in model compression with minimal loss of accuracy.
The technique is being evaluated in various speech applications, such as
speech recognition and text-to-speech, and has been experimented on across
various vision and translation models.
API Compatibility Matrix
Users can apply pruning with the following APIs:
Model building: keras with only Sequential and Functional models
TensorFlow versions: TF 1.x for versions 1.14+ and 2.x.
tf.compat.v1 with a TF 2.X package and tf.compat.v2 with a TF 1.X
package are not supported.
TensorFlow execution mode: both graph and eager
Distributed training: tf.distribute with only graph execution
It is on our roadmap to add support in the following areas:
[[["Easy to understand","easyToUnderstand","thumb-up"],["Solved my problem","solvedMyProblem","thumb-up"],["Other","otherUp","thumb-up"]],[["Missing the information I need","missingTheInformationINeed","thumb-down"],["Too complicated / too many steps","tooComplicatedTooManySteps","thumb-down"],["Out of date","outOfDate","thumb-down"],["Samples / code issue","samplesCodeIssue","thumb-down"],["Other","otherDown","thumb-down"]],["Last updated 2024-02-03 UTC."],[],[],null,["# Trim insignificant weights\n\n\u003cbr /\u003e\n\nThis document provides an overview on model pruning to help you determine how it\nfits with your use case.\n\n- To dive right into an end-to-end example, see the [Pruning with Keras](/model_optimization/guide/pruning/pruning_with_keras) example.\n- To quickly find the APIs you need for your use case, see the [pruning comprehensive guide](/model_optimization/guide/pruning/comprehensive_guide).\n- To explore the application of pruning for on-device inference, see the [Pruning for on-device inference with XNNPACK](/model_optimization/guide/pruning/pruning_for_on_device_inference).\n- To see an example of structural pruning, run the tutorial [Structural pruning with 2 by 4 sparsity](/model_optimization/guide/pruning/pruning_with_sparsity_2_by_4).\n\nOverview\n--------\n\nMagnitude-based weight pruning gradually zeroes out model weights during the\ntraining process to achieve model sparsity. Sparse models are easier to\ncompress, and we can skip the zeroes during inference for latency improvements.\n\nThis technique brings improvements via model compression. In the future,\nframework support for this technique will provide latency improvements. We've\nseen up to 6x improvements in model compression with minimal loss of accuracy.\n\nThe technique is being evaluated in various speech applications, such as\nspeech recognition and text-to-speech, and has been experimented on across\nvarious vision and translation models.\n\n### API Compatibility Matrix\n\nUsers can apply pruning with the following APIs:\n\n- Model building: `keras` with only Sequential and Functional models\n- TensorFlow versions: TF 1.x for versions 1.14+ and 2.x.\n - [`tf.compat.v1`](https://www.tensorflow.org/api_docs/python/tf/compat/v1) with a TF 2.X package and `tf.compat.v2` with a TF 1.X package are not supported.\n- TensorFlow execution mode: both graph and eager\n- Distributed training: [`tf.distribute`](https://www.tensorflow.org/api_docs/python/tf/distribute) with only graph execution\n\nIt is on our roadmap to add support in the following areas:\n\n- [Minimal Subclassed model support](https://github.com/tensorflow/model-optimization/issues/155)\n- [Framework support for latency improvements](https://github.com/tensorflow/model-optimization/issues/173)\n\nResults\n-------\n\n### Image Classification\n\n| Model | Non-sparse Top-1 Accuracy | Random Sparse Accuracy | Random Sparsity | Structured Sparse Accuracy | Structured Sparsity |\n|-----------------|---------------------------|------------------------|-----------------|----------------------------|---------------------|\n| InceptionV3 | 78.1% | 78.0% | 50% | 75.8% | 2 by 4 |\n| InceptionV3 | 78.1% | 76.1% | 75% |\n| InceptionV3 | 78.1% | 74.6% | 87.5% |\n| MobilenetV1 224 | 71.04% | 70.84% | 50% | 67.35% | 2 by 4 |\n| MobilenetV2 224 | 71.77% | 69.64% | 50% | 66.75% | 2 by 4 |\n\nThe models were tested on Imagenet.\n\n### Translation\n\n| Model | Non-sparse BLEU | Sparse BLEU | Sparsity |\n|------------|-----------------|-------------|----------|\n| GNMT EN-DE | 26.77 | 26.86 | 80% |\n| GNMT EN-DE | 26.77 | 26.52 | 85% |\n| GNMT EN-DE | 26.77 | 26.19 | 90% |\n| GNMT DE-EN | 29.47 | 29.50 | 80% |\n| GNMT DE-EN | 29.47 | 29.24 | 85% |\n| GNMT DE-EN | 29.47 | 28.81 | 90% |\n\nThe models use WMT16 German and English dataset with news-test2013 as the dev\nset and news-test2015 as the test set.\n\n### Keyword spotting model\n\nDS-CNN-L is a keyword spotting model created for edge devices. It can be found\nin ARM software's\n[examples repository](https://github.com/ARM-software/ML-examples/tree/master/tflu-kws-cortex-m).\n\n| Model | Non-sparse Accuracy | Structured Sparse Accuracy (2 by 4 pattern) | Random Sparse Accuracy (target sparsity 50%) |\n|----------|---------------------|---------------------------------------------|----------------------------------------------|\n| DS-CNN-L | 95.23 | 94.33 | 94.84 |\n\nExamples\n--------\n\nIn addition to the [Prune with Keras](/model_optimization/guide/pruning/pruning_with_keras)\ntutorial, see the following examples:\n\n- Train a CNN model on the MNIST handwritten digit classification task with pruning: [code](https://github.com/tensorflow/model-optimization/blob/master/tensorflow_model_optimization/python/examples/sparsity/keras/mnist/mnist_cnn.py)\n- Train a LSTM on the IMDB sentiment classification task with pruning: [code](https://github.com/tensorflow/model-optimization/blob/master/tensorflow_model_optimization/python/examples/sparsity/keras/imdb/imdb_lstm.py)\n\nFor background, see *To prune, or not to prune: exploring the efficacy of\npruning for model compression* \\[[paper](https://arxiv.org/pdf/1710.01878.pdf)\\]."]]