tf_agents.policies.random_py_policy.RandomPyPolicy
Stay organized with collections
Save and categorize content based on your preferences.
Returns random samples of the given action_spec.
Inherits From: PyPolicy
tf_agents.policies.random_py_policy.RandomPyPolicy(
time_step_spec: tf_agents.trajectories.TimeStep
,
action_spec: tf_agents.typing.types.NestedArraySpec
,
policy_state_spec: tf_agents.typing.types.NestedArraySpec
= (),
info_spec: tf_agents.typing.types.NestedArraySpec
= (),
seed: Optional[types.Seed] = None,
outer_dims: Optional[Sequence[int]] = None,
observation_and_action_constraint_splitter: Optional[types.Splitter] = None
)
Used in the notebooks
Args |
time_step_spec
|
Reference time_step_spec . If not None and outer_dims is
not provided this is used to infer the outer_dims required for the given
time_step when action is called.
|
action_spec
|
A nest of BoundedArraySpec representing the actions to sample
from.
|
policy_state_spec
|
Nest of tf.TypeSpec representing the data in the
policy state field.
|
info_spec
|
Nest of tf.TypeSpec representing the data in the policy info
field.
|
seed
|
Optional seed used to instantiate a random number generator.
|
outer_dims
|
An optional list/tuple specifying outer dimensions to add to
the spec shape before sampling. If unspecified the outer_dims are
derived from the outer_dims in the given observation when action is
called.
|
observation_and_action_constraint_splitter
|
A function used to process
observations with action constraints. These constraints can indicate,
for example, a mask of valid/invalid actions for a given state of the
environment. The function takes in a full observation and returns a
tuple consisting of 1) the part of the observation intended as input to
the network and 2) the constraint. An example
observation_and_action_constraint_splitter could be as simple as: def observation_and_action_constraint_splitter(observation): return
observation['network_input'], observation['constraint'] Note: when
using observation_and_action_constraint_splitter , make sure the
provided q_network is compatible with the network-specific half of the
output of the observation_and_action_constraint_splitter . In
particular, observation_and_action_constraint_splitter will be called
on the observation before passing to the network. If
observation_and_action_constraint_splitter is None, action constraints
are not applied.
|
Attributes |
action_spec
|
Describes the ArraySpecs of the np.Array returned by action() .
action can be a single np.Array, or a nested dict, list or tuple of
np.Array.
|
collect_data_spec
|
Describes the data collected when using this policy with an environment.
|
info_spec
|
Describes the Arrays emitted as info by action() .
|
observation_and_action_constraint_splitter
|
|
policy_state_spec
|
Describes the arrays expected by functions with policy_state as input.
|
policy_step_spec
|
Describes the output of action() .
|
time_step_spec
|
Describes the TimeStep np.Arrays expected by action(time_step) .
|
trajectory_spec
|
Describes the data collected when using this policy with an environment.
|
Methods
action
View source
action(
time_step: tf_agents.trajectories.TimeStep
,
policy_state: tf_agents.typing.types.NestedArray
= (),
seed: Optional[types.Seed] = None
) -> tf_agents.trajectories.PolicyStep
Generates next action given the time_step and policy_state.
Args |
time_step
|
A TimeStep tuple corresponding to time_step_spec() .
|
policy_state
|
An optional previous policy_state.
|
seed
|
Seed to use if action uses sampling (optional).
|
Returns |
A PolicyStep named tuple containing:
action : A nest of action Arrays matching the action_spec() .
state : A nest of policy states to be fed into the next call to action.
info : Optional side information such as action log probabilities.
|
get_initial_state
View source
get_initial_state(
batch_size: Optional[int] = None
) -> tf_agents.typing.types.NestedArray
Returns an initial state usable by the policy.
Args |
batch_size
|
An optional batch size.
|
Returns |
An initial policy state.
|
Except as otherwise noted, the content of this page is licensed under the Creative Commons Attribution 4.0 License, and code samples are licensed under the Apache 2.0 License. For details, see the Google Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.
Last updated 2024-04-26 UTC.
[[["Easy to understand","easyToUnderstand","thumb-up"],["Solved my problem","solvedMyProblem","thumb-up"],["Other","otherUp","thumb-up"]],[["Missing the information I need","missingTheInformationINeed","thumb-down"],["Too complicated / too many steps","tooComplicatedTooManySteps","thumb-down"],["Out of date","outOfDate","thumb-down"],["Samples / code issue","samplesCodeIssue","thumb-down"],["Other","otherDown","thumb-down"]],["Last updated 2024-04-26 UTC."],[],[]]