tf.contrib.distribute.MultiWorkerAllReduce
Stay organized with collections
Save and categorize content based on your preferences.
All-reduce algorithms for distributed TensorFlow.
Inherits From: AllReduceCrossDeviceOps
tf.contrib.distribute.MultiWorkerAllReduce(
worker_devices, num_gpus_per_worker, all_reduce_spec=('pscpu/pscpu', 2, -1),
num_packs=0, agg_small_grads_max_bytes=0, agg_small_grads_max_group=10
)
Args |
worker_devices
|
a list of device strings for workers participating in
all-reduce.
|
num_gpus_per_worker
|
number of GPU devices per worker.
|
all_reduce_spec
|
a tuple or a named tuple or a list of tuples specifying
the all-reduce algorithm.
- The first element of a tuple is the name of the all-reduce algorithm.
Valid algorithm names are: "nccl", "nccl/xring", "nccl/rechd",
"nccl/pscpu", "xring", "pscpu", "psgpu", "pscpu/pscpu". Algorithms with
a "/" are hierarchical, so two all-reduces are executed, the first one
aggregates tensors within a worker and the second aggregates across
workers.
- The second element of a tuple is the number of shards when doing
all-reduce. Let's say its values is M, each tensor after packing will be
split into M shards and then M parallel all-reduces would be performed
before finally they are concatenated backed into a complete tensor.
- The third element is the maximum size of tensors that will be
applicable for the algorithm specified by the first element. For
example, if all_reduce_spec=[("nccl", 2, 1024), ("pscpu/pscpu", 2, -1)],
tensors with size not larger than 1024 bytes will be applied a 2-shard
"nccl" all-reduce and other tensors will be applied a 2-shard
"pscpu/pscpu" algorithm. The third elements should be in increasing
order across tuples and end with -1 which indicates infinity.
|
num_packs
|
see AllReduceCrossDeviceOps.
|
agg_small_grads_max_bytes
|
see AllReduceCrossDeviceOps.
|
agg_small_grads_max_group
|
see AllReduceCrossDeviceOps.
|
Methods
batch_reduce
View source
batch_reduce(
reduce_op, value_destination_pairs
)
Reduce PerReplica objects in a batch.
Reduce each first element in value_destination_pairs
to each second
element which indicates the destinations.
Args |
reduce_op
|
Indicates how per_replica_value will be reduced. Accepted
values are tf.distribute.ReduceOp.SUM , tf.distribute.ReduceOp.MEAN .
|
value_destination_pairs
|
a list or a tuple of tuples of PerReplica objects
(or tensors with device set if there is one device) and destinations.
|
Returns |
a list of Mirrored objects.
|
Raises |
ValueError
|
if value_destination_pairs is not a list or a tuple of
tuples of PerReplica objects and destinations
|
broadcast
View source
broadcast(
tensor, destinations
)
Broadcast the tensor
to destinations.
Args |
tensor
|
the tensor to broadcast.
|
destinations
|
the broadcast destinations.
|
Returns |
a Mirrored object.
|
reduce
View source
reduce(
reduce_op, per_replica_value, destinations
)
Reduce per_replica_value
to destinations
.
It runs the reduction operation defined by reduce_op
and put the
result on destinations
.
Returns |
a Mirrored object.
|
Raises |
ValueError
|
if per_replica_value can't be converted to a PerReplica
object.
|
Except as otherwise noted, the content of this page is licensed under the Creative Commons Attribution 4.0 License, and code samples are licensed under the Apache 2.0 License. For details, see the Google Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.
Last updated 2020-10-01 UTC.
[[["Easy to understand","easyToUnderstand","thumb-up"],["Solved my problem","solvedMyProblem","thumb-up"],["Other","otherUp","thumb-up"]],[["Missing the information I need","missingTheInformationINeed","thumb-down"],["Too complicated / too many steps","tooComplicatedTooManySteps","thumb-down"],["Out of date","outOfDate","thumb-down"],["Samples / code issue","samplesCodeIssue","thumb-down"],["Other","otherDown","thumb-down"]],["Last updated 2020-10-01 UTC."],[],[]]