tf_agents.bandits.environments.drifting_linear_environment.DriftingLinearDynamics
Stay organized with collections
Save and categorize content based on your preferences.
A drifting linear environment dynamics.
Inherits From: EnvironmentDynamics
tf_agents.bandits.environments.drifting_linear_environment.DriftingLinearDynamics(
observation_distribution: types.Distribution,
observation_to_reward_distribution: types.Distribution,
drift_distribution: types.Distribution,
additive_reward_distribution: types.Distribution
)
This is a drifting linear environment which computes rewards as:
rewards(t) = observation(t) * observation_to_reward(t) + additive_reward(t)
where t
is the environment time. observation_to_reward
slowly rotates over
time. The environment time is incremented in the base class after the reward
is computed. The parameters observation_to_reward
and additive_reward
are
updated at each time step.
In order to preserve the norm of the observation_to_reward
(and the range
of values of the reward) the drift is applied in form of rotations, i.e.,
observation_to_reward(t) = R(theta(t)) * observation_to_reward(t - 1)
where theta
is the angle of the rotation. The angle is sampled from a
provided input distribution.
Args |
observation_distribution
|
A distribution from tfp.distributions with shape
[batch_size, observation_dim] Note that the values of batch_size and
observation_dim are deduced from the distribution.
|
observation_to_reward_distribution
|
A distribution from
tfp.distributions with shape [observation_dim, num_actions] . The
value observation_dim must match the second dimension of
observation_distribution .
|
drift_distribution
|
A scalar distribution from tfp.distributions of type
tf.float32. It represents the angle of rotation.
|
additive_reward_distribution
|
A distribution from tfp.distributions with
shape [num_actions] . This models the non-contextual behavior of the
bandit.
|
Attributes |
action_spec
|
Specification of the actions.
|
batch_size
|
Returns the batch size used for observations and rewards.
|
observation_spec
|
Specification of the observations.
|
Methods
compute_optimal_action
View source
compute_optimal_action(
observation: tf_agents.typing.types.NestedTensor
) -> tf_agents.typing.types.NestedTensor
compute_optimal_reward
View source
compute_optimal_reward(
observation: tf_agents.typing.types.NestedTensor
) -> tf_agents.typing.types.NestedTensor
observation
View source
observation(
unused_t
) -> tf_agents.typing.types.NestedTensor
Returns an observation batch for the given time.
Args |
env_time
|
The scalar int64 tensor of the environment time step. This is
incremented by the environment after the reward is computed.
|
Returns |
The observation batch with spec according to observation_spec.
|
reward
View source
reward(
observation: tf_agents.typing.types.NestedTensor
,
t: tf_agents.typing.types.Int
) -> tf_agents.typing.types.NestedTensor
Reward for the given observation and time step.
Args |
observation
|
A batch of observations with spec according to
observation_spec.
|
env_time
|
The scalar int64 tensor of the environment time step. This is
incremented by the environment after the reward is computed.
|
Returns |
A batch of rewards with spec shape [batch_size, num_actions] containing
rewards for all arms.
|
Except as otherwise noted, the content of this page is licensed under the Creative Commons Attribution 4.0 License, and code samples are licensed under the Apache 2.0 License. For details, see the Google Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.
Last updated 2024-04-26 UTC.
[[["Easy to understand","easyToUnderstand","thumb-up"],["Solved my problem","solvedMyProblem","thumb-up"],["Other","otherUp","thumb-up"]],[["Missing the information I need","missingTheInformationINeed","thumb-down"],["Too complicated / too many steps","tooComplicatedTooManySteps","thumb-down"],["Out of date","outOfDate","thumb-down"],["Samples / code issue","samplesCodeIssue","thumb-down"],["Other","otherDown","thumb-down"]],["Last updated 2024-04-26 UTC."],[],[],null,["# tf_agents.bandits.environments.drifting_linear_environment.DriftingLinearDynamics\n\n\u003cbr /\u003e\n\n|---------------------------------------------------------------------------------------------------------------------------------------------------|\n| [View source on GitHub](https://github.com/tensorflow/agents/blob/v0.19.0/tf_agents/bandits/environments/drifting_linear_environment.py#L87-L264) |\n\nA drifting linear environment dynamics.\n\nInherits From: [`EnvironmentDynamics`](../../../../tf_agents/bandits/environments/non_stationary_stochastic_environment/EnvironmentDynamics) \n\n tf_agents.bandits.environments.drifting_linear_environment.DriftingLinearDynamics(\n observation_distribution: types.Distribution,\n observation_to_reward_distribution: types.Distribution,\n drift_distribution: types.Distribution,\n additive_reward_distribution: types.Distribution\n )\n\nThis is a drifting linear environment which computes rewards as:\n\nrewards(t) = observation(t) \\* observation_to_reward(t) + additive_reward(t)\n\nwhere `t` is the environment time. `observation_to_reward` slowly rotates over\ntime. The environment time is incremented in the base class after the reward\nis computed. The parameters `observation_to_reward` and `additive_reward` are\nupdated at each time step.\nIn order to preserve the norm of the `observation_to_reward` (and the range\nof values of the reward) the drift is applied in form of rotations, i.e.,\n\nobservation_to_reward(t) = R(theta(t)) \\* observation_to_reward(t - 1)\n\nwhere `theta` is the angle of the rotation. The angle is sampled from a\nprovided input distribution.\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n| Args ---- ||\n|--------------------------------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|\n| `observation_distribution` | A distribution from tfp.distributions with shape `[batch_size, observation_dim]` Note that the values of `batch_size` and `observation_dim` are deduced from the distribution. |\n| `observation_to_reward_distribution` | A distribution from [`tfp.distributions`](https://www.tensorflow.org/probability/api_docs/python/tfp/distributions) with shape `[observation_dim, num_actions]`. The value `observation_dim` must match the second dimension of `observation_distribution`. |\n| `drift_distribution` | A scalar distribution from [`tfp.distributions`](https://www.tensorflow.org/probability/api_docs/python/tfp/distributions) of type tf.float32. It represents the angle of rotation. |\n| `additive_reward_distribution` | A distribution from [`tfp.distributions`](https://www.tensorflow.org/probability/api_docs/python/tfp/distributions) with shape `[num_actions]`. This models the non-contextual behavior of the bandit. |\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n| Attributes ---------- ||\n|--------------------|-----------------------------------------------------------|\n| `action_spec` | Specification of the actions. |\n| `batch_size` | Returns the batch size used for observations and rewards. |\n| `observation_spec` | Specification of the observations. |\n\n\u003cbr /\u003e\n\nMethods\n-------\n\n### `compute_optimal_action`\n\n[View source](https://github.com/tensorflow/agents/blob/v0.19.0/tf_agents/bandits/environments/drifting_linear_environment.py#L254-L264) \n\n compute_optimal_action(\n observation: ../../../../tf_agents/typing/types/NestedTensor\n ) -\u003e ../../../../tf_agents/typing/types/NestedTensor\n\n### `compute_optimal_reward`\n\n[View source](https://github.com/tensorflow/agents/blob/v0.19.0/tf_agents/bandits/environments/drifting_linear_environment.py#L244-L252) \n\n compute_optimal_reward(\n observation: ../../../../tf_agents/typing/types/NestedTensor\n ) -\u003e ../../../../tf_agents/typing/types/NestedTensor\n\n### `observation`\n\n[View source](https://github.com/tensorflow/agents/blob/v0.19.0/tf_agents/bandits/environments/drifting_linear_environment.py#L211-L212) \n\n observation(\n unused_t\n ) -\u003e ../../../../tf_agents/typing/types/NestedTensor\n\nReturns an observation batch for the given time.\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n| Args ||\n|------------|----------------------------------------------------------------------------------------------------------------------------|\n| `env_time` | The scalar int64 tensor of the environment time step. This is incremented by the environment after the reward is computed. |\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n| Returns ||\n|---|---|\n| The observation batch with spec according to `observation_spec.` ||\n\n\u003cbr /\u003e\n\n### `reward`\n\n[View source](https://github.com/tensorflow/agents/blob/v0.19.0/tf_agents/bandits/environments/drifting_linear_environment.py#L214-L242) \n\n reward(\n observation: ../../../../tf_agents/typing/types/NestedTensor,\n t: ../../../../tf_agents/typing/types/Int\n ) -\u003e ../../../../tf_agents/typing/types/NestedTensor\n\nReward for the given observation and time step.\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n| Args ||\n|---------------|----------------------------------------------------------------------------------------------------------------------------|\n| `observation` | A batch of observations with spec according to `observation_spec.` |\n| `env_time` | The scalar int64 tensor of the environment time step. This is incremented by the environment after the reward is computed. |\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n| Returns ||\n|---|---|\n| A batch of rewards with spec shape \\[batch_size, num_actions\\] containing rewards for all arms. ||\n\n\u003cbr /\u003e"]]