tf_agents.trajectories.trajectory.from_episode
Stay organized with collections
Save and categorize content based on your preferences.
Create a Trajectory from tensors representing a single episode.
tf_agents . trajectories . trajectory . from_episode (
observation : tf_agents . typing . types . NestedSpecTensorOrArray ,
action : tf_agents . typing . types . NestedSpecTensorOrArray ,
policy_info : tf_agents . typing . types . NestedSpecTensorOrArray ,
reward : tf_agents . typing . types . NestedSpecTensorOrArray ,
discount : Optional [ types . SpecTensorOrArray ] = None
) -> tf_agents . trajectories . Trajectory
If none of the inputs are tensors, then numpy arrays are generated instead.
If discount is not provided, the first entry in reward is used to estimate
T:
reward_0 = tf . nest . flatten ( reward )[ 0 ]
T = shape ( reward_0 )[ 0 ]
In this case, a discount of all ones having dtype float32 is generated.
Note: all tensors/numpy arrays passed to this function have the same time
dimension T. When the generated trajectory passes through to_transition,
it will only return a (time_steps, next_time_steps) pair with T - 1 in the
time dimension, which means the reward at step T is dropped. So if the reward
at step T is important, please make sure the episode passed to this function
contains an additional step.
Args
observation
(possibly nested tuple of) Tensor or np.ndarray; all shaped
[T, ...].
action
(possibly nested tuple of) Tensor or np.ndarray; all shaped [T,
...].
policy_info
(possibly nested tuple of) Tensor or np.ndarray; all shaped
[T, ...].
reward
(possibly nested tuple of) Tensor or np.ndarray; all shaped [T,
...].
discount
A floating point vector Tensor or np.ndarray; shaped [T]
(optional).
Returns
An instance of Trajectory.
Except as otherwise noted, the content of this page is licensed under the Creative Commons Attribution 4.0 License , and code samples are licensed under the Apache 2.0 License . For details, see the Google Developers Site Policies . Java is a registered trademark of Oracle and/or its affiliates.
Last updated 2024-04-26 UTC.
[[["Easy to understand","easyToUnderstand","thumb-up"],["Solved my problem","solvedMyProblem","thumb-up"],["Other","otherUp","thumb-up"]],[["Missing the information I need","missingTheInformationINeed","thumb-down"],["Too complicated / too many steps","tooComplicatedTooManySteps","thumb-down"],["Out of date","outOfDate","thumb-down"],["Samples / code issue","samplesCodeIssue","thumb-down"],["Other","otherDown","thumb-down"]],["Last updated 2024-04-26 UTC."],[],[]]