View source on GitHub
|
Initializes accelerators and communication fabrics for DTensor.
tf.experimental.dtensor.initialize_accelerator_system(
device_type: Optional[str] = None,
enable_coordination_service: Optional[bool] = True
) -> str
DTensor configures TensorFlow to run in the local mode or multi-client mode.
- In local mode, a mesh can only use devices attached to the current process.
- In multi-client mode, a mesh can span across devices from multiple clients.
If DTENSOR_JOBS is non-empty, DTensor configures TensorFlow to run in the
multi-client mode using the distributed runtime. In multi-client mode devices
on different clients can communicate with each other.
The following environment variables controls the behavior of this function.
DTENSOR_JOBS: string, a comma separated list. Each item in the list is of format{hostname}:{port}. If empty, DTensor runs in the local mode. Examples of validDTENSOR_JOBSvalues:- 4 clients on localhost:
localhost:10000,localhost:10001,localhost:10002,localhost:10003 - 2 clients on host1, 2 clients on host2
host1:10000,host1:10001,host2:10000,host2:10003If the hostnames are BNS addresses, the items must be sorted in alphabetical order.
- 4 clients on localhost:
DTENSOR_CLIENT_ID: integer, between0tonum_clients - 1, to identify the client id of the current process. The default value is0.DTENSOR_JOB_NAME: string, a string for the name of the TensorFlow job. The job name controls the job name section of the TensorFlow DeviceSpecs, e.g.,job:workerin/job:worker/replica:0/task:0/device:TPU:0when the job name isworker. The default value islocalhostin local mode, andworkerwhen in the multi-client mode. All DTensor clients within the same multi-client cluster share the same job name.DTENSOR_USE_PARALLEL_EXECUTOR: string, with its value beingpwto specify that the backend is Pathways, and TensorFlow otherwise.
Args | |
|---|---|
device_type
|
Type of accelerator to use, can be CPU, GPU, or TPU. If None,
uses tf.experimental.dtensor.preferred_device_type().
|
enable_coordination_service
|
If true, enable distributed coordination service to make sure that workers know the devices on each other, when there is more than 1 client. |
Returns | |
|---|---|
device_type
|
the type of accelerator that was initialized. |
View source on GitHub