Migrate single-worker multiple-GPU training
Stay organized with collections
Save and categorize content based on your preferences.
This guide demonstrates how to migrate the single-worker multiple-GPU workflows from TensorFlow 1 to TensorFlow 2.
To perform synchronous training across multiple GPUs on one machine:
Setup
Start with imports and a simple dataset for demonstration purposes:
import tensorflow as tf
import tensorflow.compat.v1 as tf1
features = [[1., 1.5], [2., 2.5], [3., 3.5]]
labels = [[0.3], [0.5], [0.7]]
eval_features = [[4., 4.5], [5., 5.5], [6., 6.5]]
eval_labels = [[0.8], [0.9], [1.]]
TensorFlow 1: Single-worker distributed training with tf.estimator.Estimator
This example demonstrates the TensorFlow 1 canonical workflow of single-worker multiple-GPU training. You need to set the distribution strategy (tf.distribute.MirroredStrategy
) through the config
parameter of the tf.estimator.Estimator
:
def _input_fn():
return tf1.data.Dataset.from_tensor_slices((features, labels)).batch(1)
def _eval_input_fn():
return tf1.data.Dataset.from_tensor_slices(
(eval_features, eval_labels)).batch(1)
def _model_fn(features, labels, mode):
logits = tf1.layers.Dense(1)(features)
loss = tf1.losses.mean_squared_error(labels=labels, predictions=logits)
optimizer = tf1.train.AdagradOptimizer(0.05)
train_op = optimizer.minimize(loss, global_step=tf1.train.get_global_step())
return tf1.estimator.EstimatorSpec(mode, loss=loss, train_op=train_op)
strategy = tf1.distribute.MirroredStrategy()
config = tf1.estimator.RunConfig(
train_distribute=strategy, eval_distribute=strategy)
estimator = tf1.estimator.Estimator(model_fn=_model_fn, config=config)
train_spec = tf1.estimator.TrainSpec(input_fn=_input_fn)
eval_spec = tf1.estimator.EvalSpec(input_fn=_eval_input_fn)
tf1.estimator.train_and_evaluate(estimator, train_spec, eval_spec)
TensorFlow 2: Single-worker training with Keras
When migrating to TensorFlow 2, you can use the Keras APIs with tf.distribute.MirroredStrategy
.
If you use the tf.keras
APIs for model building and Keras Model.fit
for training, the main difference is instantiating the Keras model, an optimizer, and metrics in the context of Strategy.scope
, instead of defining a config
for tf.estimator.Estimator
.
If you need to use a custom training loop, check out the Using tf.distribute.Strategy with custom training loops guide.
dataset = tf.data.Dataset.from_tensor_slices((features, labels)).batch(1)
eval_dataset = tf.data.Dataset.from_tensor_slices(
(eval_features, eval_labels)).batch(1)
strategy = tf.distribute.MirroredStrategy()
with strategy.scope():
model = tf.keras.models.Sequential([tf.keras.layers.Dense(1)])
optimizer = tf.keras.optimizers.Adagrad(learning_rate=0.05)
model.compile(optimizer=optimizer, loss='mse')
model.fit(dataset)
model.evaluate(eval_dataset, return_dict=True)
Next steps
To learn more about distributed training with tf.distribute.MirroredStrategy
in TensorFlow 2, check out the following documentation:
Except as otherwise noted, the content of this page is licensed under the Creative Commons Attribution 4.0 License, and code samples are licensed under the Apache 2.0 License. For details, see the Google Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.
Last updated 2024-03-23 UTC.
[[["Easy to understand","easyToUnderstand","thumb-up"],["Solved my problem","solvedMyProblem","thumb-up"],["Other","otherUp","thumb-up"]],[["Missing the information I need","missingTheInformationINeed","thumb-down"],["Too complicated / too many steps","tooComplicatedTooManySteps","thumb-down"],["Out of date","outOfDate","thumb-down"],["Samples / code issue","samplesCodeIssue","thumb-down"],["Other","otherDown","thumb-down"]],["Last updated 2024-03-23 UTC."],[],[],null,["# Migrate single-worker multiple-GPU training\n\n\u003cbr /\u003e\n\n|--------------------------------------------------------------------------------------|-------------------------------------------------------------------------------------------------------------------------------------------|-----------------------------------------------------------------------------------------------------------------------|------------------------------------------------------------------------------------------------------------------------|\n| [View on TensorFlow.org](https://www.tensorflow.org/guide/migrate/mirrored_strategy) | [Run in Google Colab](https://colab.research.google.com/github/tensorflow/docs/blob/master/site/en/guide/migrate/mirrored_strategy.ipynb) | [View source on GitHub](https://github.com/tensorflow/docs/blob/master/site/en/guide/migrate/mirrored_strategy.ipynb) | [Download notebook](https://storage.googleapis.com/tensorflow_docs/docs/site/en/guide/migrate/mirrored_strategy.ipynb) |\n\nThis guide demonstrates how to migrate the single-worker multiple-GPU workflows from TensorFlow 1 to TensorFlow 2.\n\nTo perform synchronous training across multiple GPUs on one machine:\n\n- In TensorFlow 1, you use the `tf.estimator.Estimator` APIs with [`tf.distribute.MirroredStrategy`](https://www.tensorflow.org/api_docs/python/tf/distribute/MirroredStrategy).\n- In TensorFlow 2, you can use [Keras Model.fit](https://www.tensorflow.org/tutorials/distribute/keras) or [a custom training loop](https://www.tensorflow.org/tutorials/distribute/custom_training) with [`tf.distribute.MirroredStrategy`](https://www.tensorflow.org/api_docs/python/tf/distribute/MirroredStrategy). Learn more in the [Distributed training with TensorFlow](https://www.tensorflow.org/guide/distributed_training#mirroredstrategy) guide.\n\nSetup\n-----\n\nStart with imports and a simple dataset for demonstration purposes: \n\n import tensorflow as tf\n import tensorflow.compat.v1 as tf1\n\n features = [[1., 1.5], [2., 2.5], [3., 3.5]]\n labels = [[0.3], [0.5], [0.7]]\n eval_features = [[4., 4.5], [5., 5.5], [6., 6.5]]\n eval_labels = [[0.8], [0.9], [1.]]\n\nTensorFlow 1: Single-worker distributed training with tf.estimator.Estimator\n----------------------------------------------------------------------------\n\nThis example demonstrates the TensorFlow 1 canonical workflow of single-worker multiple-GPU training. You need to set the distribution strategy ([`tf.distribute.MirroredStrategy`](https://www.tensorflow.org/api_docs/python/tf/distribute/MirroredStrategy)) through the `config` parameter of the `tf.estimator.Estimator`: \n\n def _input_fn():\n return tf1.data.Dataset.from_tensor_slices((features, labels)).batch(1)\n\n def _eval_input_fn():\n return tf1.data.Dataset.from_tensor_slices(\n (eval_features, eval_labels)).batch(1)\n\n def _model_fn(features, labels, mode):\n logits = tf1.layers.Dense(1)(features)\n loss = tf1.losses.mean_squared_error(labels=labels, predictions=logits)\n optimizer = tf1.train.AdagradOptimizer(0.05)\n train_op = optimizer.minimize(loss, global_step=tf1.train.get_global_step())\n return tf1.estimator.EstimatorSpec(mode, loss=loss, train_op=train_op)\n\n strategy = tf1.distribute.MirroredStrategy()\n config = tf1.estimator.RunConfig(\n train_distribute=strategy, eval_distribute=strategy)\n estimator = tf1.estimator.Estimator(model_fn=_model_fn, config=config)\n\n train_spec = tf1.estimator.TrainSpec(input_fn=_input_fn)\n eval_spec = tf1.estimator.EvalSpec(input_fn=_eval_input_fn)\n tf1.estimator.train_and_evaluate(estimator, train_spec, eval_spec)\n\nTensorFlow 2: Single-worker training with Keras\n-----------------------------------------------\n\nWhen migrating to TensorFlow 2, you can use the Keras APIs with [`tf.distribute.MirroredStrategy`](https://www.tensorflow.org/api_docs/python/tf/distribute/MirroredStrategy).\n\nIf you use the [`tf.keras`](https://www.tensorflow.org/api_docs/python/tf/keras) APIs for model building and Keras [`Model.fit`](https://www.tensorflow.org/api_docs/python/tf/keras/Model#fit) for training, the main difference is instantiating the Keras model, an optimizer, and metrics in the context of [`Strategy.scope`](https://www.tensorflow.org/api_docs/python/tf/distribute/MirroredStrategy#scope), instead of defining a `config` for `tf.estimator.Estimator`.\n\nIf you need to use a custom training loop, check out the [Using tf.distribute.Strategy with custom training loops](https://www.tensorflow.org/guide/distributed_training#using_tfdistributestrategy_with_custom_training_loops) guide. \n\n dataset = tf.data.Dataset.from_tensor_slices((features, labels)).batch(1)\n eval_dataset = tf.data.Dataset.from_tensor_slices(\n (eval_features, eval_labels)).batch(1)\n\n strategy = tf.distribute.MirroredStrategy()\n with strategy.scope():\n model = tf.keras.models.Sequential([tf.keras.layers.Dense(1)])\n optimizer = tf.keras.optimizers.Adagrad(learning_rate=0.05)\n\n model.compile(optimizer=optimizer, loss='mse')\n model.fit(dataset)\n model.evaluate(eval_dataset, return_dict=True)\n\nNext steps\n----------\n\nTo learn more about distributed training with [`tf.distribute.MirroredStrategy`](https://www.tensorflow.org/api_docs/python/tf/distribute/MirroredStrategy) in TensorFlow 2, check out the following documentation:\n\n- The [Distributed training on one machine with Keras](../../tutorials/distribute/keras) tutorial\n- The [Distributed training on one machine with a custom training loop](../../tutorials/distribute/custom_training) tutorial\n- The [Distributed training with TensorFlow](../../guide/distributed_training) guide\n- The [Using multiple GPUs](../../guide/gpu#using_multiple_gpus) guide\n- The [Optimize the performance on the multi-GPU single host (with the TensorFlow Profiler)](../../guide/gpu_performance_analysis#2_optimize_the_performance_on_the_multi-gpu_single_host) guide"]]