tf_agents.trajectories.trajectory.from_episode
Stay organized with collections
Save and categorize content based on your preferences.
Create a Trajectory from tensors representing a single episode.
tf_agents . trajectories . trajectory . from_episode (
observation : tf_agents . typing . types . NestedSpecTensorOrArray
,
action : tf_agents . typing . types . NestedSpecTensorOrArray
,
policy_info : tf_agents . typing . types . NestedSpecTensorOrArray
,
reward : tf_agents . typing . types . NestedSpecTensorOrArray
,
discount : Optional [ types . SpecTensorOrArray ] = None
) -> tf_agents . trajectories . Trajectory
If none of the inputs are tensors, then numpy arrays are generated instead.
If discount
is not provided, the first entry in reward
is used to estimate
T
:
reward_0 = tf . nest . flatten ( reward )[ 0 ]
T = shape ( reward_0 )[ 0 ]
In this case, a discount
of all ones having dtype float32
is generated.
Note: all tensors/numpy arrays passed to this function have the same time
dimension T
. When the generated trajectory passes through to_transition
,
it will only return a (time_steps, next_time_steps)
pair with T - 1
in the
time dimension, which means the reward at step T is dropped. So if the reward
at step T
is important, please make sure the episode passed to this function
contains an additional step.
Args
observation
(possibly nested tuple of) Tensor
or np.ndarray
; all shaped
[T, ...]
.
action
(possibly nested tuple of) Tensor
or np.ndarray
; all shaped [T,
...]
.
policy_info
(possibly nested tuple of) Tensor
or np.ndarray
; all shaped
[T, ...]
.
reward
(possibly nested tuple of) Tensor
or np.ndarray
; all shaped [T,
...]
.
discount
A floating point vector Tensor
or np.ndarray
; shaped [T]
(optional).
Returns
An instance of Trajectory
.
Except as otherwise noted, the content of this page is licensed under the Creative Commons Attribution 4.0 License , and code samples are licensed under the Apache 2.0 License . For details, see the Google Developers Site Policies . Java is a registered trademark of Oracle and/or its affiliates.
Last updated 2024-04-26 UTC.
[[["Easy to understand","easyToUnderstand","thumb-up"],["Solved my problem","solvedMyProblem","thumb-up"],["Other","otherUp","thumb-up"]],[["Missing the information I need","missingTheInformationINeed","thumb-down"],["Too complicated / too many steps","tooComplicatedTooManySteps","thumb-down"],["Out of date","outOfDate","thumb-down"],["Samples / code issue","samplesCodeIssue","thumb-down"],["Other","otherDown","thumb-down"]],["Last updated 2024-04-26 UTC."],[],[]]