tf_agents.trajectories.Transition
Stay organized with collections
Save and categorize content based on your preferences.
A tuple that represents a transition.
tf_agents.trajectories.Transition(
time_step, action_step, next_time_step
)
A Transition
represents a S, A, S'
sequence of operations. Tensors
within a Transition
are typically shaped [B, ...]
where B
is the
batch size.
In some cases Transition objects are used to store time-shifted intermediate
values for RNN computations, in which case the stored tensors are
shaped [B, T, ...]
.
In other cases, Transition
objects store n-step transitions
S_t, A_t, S_{t+N}
where the associated reward and discount in
next_time_step
are calculated as:
next_time_step.reward = r_t +
g^{1} * d_t * r_{t+1} +
g^{2} * d_t * d_{t+1} * r_{t+2} +
g^{3} * d_t * d_{t+1} * d_{t+2} * r_{t+3} +
...
g^{N-1} * d_t * ... * d_{t+N-2} * r_{t+N-1}
next_time_step.discount = g^{N-1} * d_t * d_{t+1} * ... * d_{t+N-1}.
See to_n_step_transition
for an example that converts Trajectory
objects
to this format.
Attributes |
time_step
|
The initial state, reward, and discount.
|
action_step
|
The action, policy info, and possibly policy state taken.
(Note, action_step.state should not typically be stored in e.g. a replay
buffer, except a copy inside policy_step.info as a special case for
algorithms that choose to do this).
|
next_time_step
|
The final state, reward, and discount.
|
Methods
replace
View source
replace(
**kwargs
) -> 'Transition'
Exposes as namedtuple._replace.
Usage:
new_transition = transition.replace(action_step=())
This returns a new transition with an empty action_step
.
Args |
**kwargs
|
key/value pairs of fields in the transition.
|
Returns |
A new Transition .
|
Except as otherwise noted, the content of this page is licensed under the Creative Commons Attribution 4.0 License, and code samples are licensed under the Apache 2.0 License. For details, see the Google Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.
Last updated 2024-04-26 UTC.
[[["Easy to understand","easyToUnderstand","thumb-up"],["Solved my problem","solvedMyProblem","thumb-up"],["Other","otherUp","thumb-up"]],[["Missing the information I need","missingTheInformationINeed","thumb-down"],["Too complicated / too many steps","tooComplicatedTooManySteps","thumb-down"],["Out of date","outOfDate","thumb-down"],["Samples / code issue","samplesCodeIssue","thumb-down"],["Other","otherDown","thumb-down"]],["Last updated 2024-04-26 UTC."],[],[],null,["# tf_agents.trajectories.Transition\n\n\u003cbr /\u003e\n\n|---------------------------------------------------------------------------------------------------------------------------|\n| [View source on GitHub](https://github.com/tensorflow/agents/blob/v0.19.0/tf_agents/trajectories/trajectory.py#L128-L195) |\n\nA tuple that represents a transition.\n\n#### View aliases\n\n\n**Main aliases**\n\n[`tf_agents.trajectories.trajectory.Transition`](https://www.tensorflow.org/agents/api_docs/python/tf_agents/trajectories/Transition)\n\n\u003cbr /\u003e\n\n tf_agents.trajectories.Transition(\n time_step, action_step, next_time_step\n )\n\nA `Transition` represents a `S, A, S'` sequence of operations. Tensors\nwithin a `Transition` are typically shaped `[B, ...]` where `B` is the\nbatch size.\n\nIn some cases Transition objects are used to store time-shifted intermediate\nvalues for RNN computations, in which case the stored tensors are\nshaped `[B, T, ...]`.\n\nIn other cases, `Transition` objects store n-step transitions\n`S_t, A_t, S_{t+N}` where the associated reward and discount in\n`next_time_step` are calculated as: \n\n next_time_step.reward = r_t +\n g^{1} * d_t * r_{t+1} +\n g^{2} * d_t * d_{t+1} * r_{t+2} +\n g^{3} * d_t * d_{t+1} * d_{t+2} * r_{t+3} +\n ...\n g^{N-1} * d_t * ... * d_{t+N-2} * r_{t+N-1}\n\n next_time_step.discount = g^{N-1} * d_t * d_{t+1} * ... * d_{t+N-1}.\n\nSee `to_n_step_transition` for an example that converts `Trajectory` objects\nto this format.\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n| Attributes ---------- ||\n|------------------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|\n| `time_step` | The initial state, reward, and discount. |\n| `action_step` | The action, policy info, and possibly policy state taken. (Note, `action_step.state` should not typically be stored in e.g. a replay buffer, except a copy inside `policy_step.info` as a special case for algorithms that choose to do this). |\n| `next_time_step` | The final state, reward, and discount. |\n\n\u003cbr /\u003e\n\nMethods\n-------\n\n### `replace`\n\n[View source](https://github.com/tensorflow/agents/blob/v0.19.0/tf_agents/trajectories/trajectory.py#L176-L192) \n\n replace(\n **kwargs\n ) -\u003e 'Transition'\n\nExposes as namedtuple._replace.\n\n#### Usage:\n\n new_transition = transition.replace(action_step=())\n\nThis returns a new transition with an empty `action_step`.\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n| Args ||\n|------------|----------------------------------------------|\n| `**kwargs` | key/value pairs of fields in the transition. |\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n| Returns ||\n|---|---|\n| A new `Transition`. ||\n\n\u003cbr /\u003e"]]