tf.distribute.coordinator.experimental_get_current_worker_index
Stay organized with collections
Save and categorize content based on your preferences.
Returns the current worker index, when called within a worker closure.
tf.distribute.coordinator.experimental_get_current_worker_index()
Some parameter server training workloads may require the worker to know its
index, for example for data sharding for reduced-variance training.
This method may be used within a tf.function
that is executed on a worker.
That is, either a dataset_fn
that runs via
ClusterCoordinator.create_per_worker_dataset
, or any other function
scheduled via ClusterCoordinator.schedule
.
Example (sharding data by worker):
strategy = tf.distribute.ParameterServerStrategy(
cluster_resolver=...)
coordinator = (
tf.distribute.coordinator.ClusterCoordinator(strategy))
def dataset_fn(context):
dataset = tf.data.Dataset.range(10)
worker_index = (
tf.distribute.coordinator.experimental_get_current_worker_index()
)
dataset = dataset.shard(
num_shards=num_workers,
index=worker_index,
)
return dataset
@tf.function
def per_worker_dataset_fn():
return strategy.distribute_datasets_from_function(dataset_fn)
per_worker_dataset = coordinator.create_per_worker_dataset(
per_worker_dataset_fn)
Raises |
RuntimeError
|
if called from outside a tf.function or outside of a remote
closure execution context (that is, on a non-worker machine).
|
Except as otherwise noted, the content of this page is licensed under the Creative Commons Attribution 4.0 License, and code samples are licensed under the Apache 2.0 License. For details, see the Google Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates. Some content is licensed under the numpy license.
Last updated 2024-04-26 UTC.
[[["Easy to understand","easyToUnderstand","thumb-up"],["Solved my problem","solvedMyProblem","thumb-up"],["Other","otherUp","thumb-up"]],[["Missing the information I need","missingTheInformationINeed","thumb-down"],["Too complicated / too many steps","tooComplicatedTooManySteps","thumb-down"],["Out of date","outOfDate","thumb-down"],["Samples / code issue","samplesCodeIssue","thumb-down"],["Other","otherDown","thumb-down"]],["Last updated 2024-04-26 UTC."],[],[],null,["# tf.distribute.coordinator.experimental_get_current_worker_index\n\n\u003cbr /\u003e\n\n|---------------------------------------------------------------------------------------------------------------------------------------------------------|\n| [View source on GitHub](https://github.com/tensorflow/tensorflow/blob/v2.16.1/tensorflow/python/distribute/coordinator/coordinator_context.py#L81-L145) |\n\nReturns the current worker index, when called within a worker closure. \n\n tf.distribute.coordinator.experimental_get_current_worker_index()\n\nSome parameter server training workloads may require the worker to know its\nindex, for example for data sharding for reduced-variance training.\n\nThis method may be used within a [`tf.function`](../../../tf/function) that is executed on a worker.\nThat is, either a `dataset_fn` that runs via\n[`ClusterCoordinator.create_per_worker_dataset`](../../../tf/distribute/experimental/coordinator/ClusterCoordinator#create_per_worker_dataset), or any other function\nscheduled via [`ClusterCoordinator.schedule`](../../../tf/distribute/experimental/coordinator/ClusterCoordinator#schedule).\n\nExample (sharding data by worker): \n\n strategy = tf.distribute.ParameterServerStrategy(\n cluster_resolver=...)\n coordinator = (\n tf.distribute.coordinator.ClusterCoordinator(strategy))\n\n def dataset_fn(context):\n dataset = tf.data.Dataset.range(10)\n worker_index = (\n tf.distribute.coordinator.experimental_get_current_worker_index()\n )\n dataset = dataset.shard(\n num_shards=num_workers,\n index=worker_index,\n )\n return dataset\n\n @tf.function\n def per_worker_dataset_fn():\n return strategy.distribute_datasets_from_function(dataset_fn)\n\n per_worker_dataset = coordinator.create_per_worker_dataset(\n per_worker_dataset_fn)\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n| Raises ------ ||\n|----------------|-----------------------------------------------------------------------------------------------------------------------------------------------------|\n| `RuntimeError` | if called from outside a [`tf.function`](../../../tf/function) or outside of a remote closure execution context (that is, on a non-worker machine). |\n\n\u003cbr /\u003e"]]