tf_agents.bandits.environments.drifting_linear_environment.DriftingLinearEnvironment
Stay organized with collections
Save and categorize content based on your preferences.
Implements a drifting linear environment.
Inherits From: NonStationaryStochasticEnvironment
, BanditTFEnvironment
, TFEnvironment
tf_agents.bandits.environments.drifting_linear_environment.DriftingLinearEnvironment(
observation_distribution: types.Distribution,
observation_to_reward_distribution: types.Distribution,
drift_distribution: types.Distribution,
additive_reward_distribution: types.Distribution
)
Args |
observation_distribution
|
A distribution from tfp.distributions with
shape [batch_size, observation_dim] . Note that the values of
batch_size and observation_dim are deduced from the distribution.
|
observation_to_reward_distribution
|
A distribution from
tfp.distributions with shape [observation_dim, num_actions] . The
value observation_dim must match the second dimension of
observation_distribution .
|
drift_distribution
|
A scalar distribution from tfp.distributions of type
tf.float32. It represents the angle of rotation.
|
additive_reward_distribution
|
A distribution from tfp.distributions with
shape [num_actions] . This models the non-contextual behavior of the
bandit.
|
Attributes |
batch_size
|
|
batched
|
|
environment_dynamics
|
|
name
|
|
Methods
action_spec
View source
action_spec()
Describes the specs of the Tensors expected by step(action)
.
action
can be a single Tensor, or a nested dict, list or tuple of
Tensors.
Returns |
An single TensorSpec , or a nested dict, list or tuple of
TensorSpec objects, which describe the shape and
dtype of each Tensor expected by step() .
|
current_time_step
View source
current_time_step()
Returns the current TimeStep
.
Returns |
A TimeStep namedtuple containing:
step_type: A StepType value.
reward: Reward at this time_step.
discount: A discount in the range [0, 1].
observation: A Tensor, or a nested dict, list or tuple of Tensors
corresponding to observation_spec() .
|
observation_spec
View source
observation_spec()
Defines the TensorSpec
of observations provided by the environment.
Returns |
A TensorSpec , or a nested dict, list or tuple of
TensorSpec objects, which describe the observation.
|
render
View source
render()
Renders a frame from the environment.
Raises |
NotImplementedError
|
If the environment does not support rendering.
|
reset
View source
reset()
Resets the environment and returns the current time_step.
Returns |
A TimeStep namedtuple containing:
step_type: A StepType value.
reward: Reward at this time_step.
discount: A discount in the range [0, 1].
observation: A Tensor, or a nested dict, list or tuple of Tensors
corresponding to observation_spec() .
|
reward_spec
View source
reward_spec()
Defines the TensorSpec
of rewards provided by the environment.
Returns |
A TensorSpec , or a nested dict, list or tuple of
TensorSpec objects, which describe the reward.
|
step
View source
step(
action
)
Steps the environment according to the action.
If the environment returned a TimeStep
with StepType.LAST
at the
previous step, this call to step
should reset the environment (note that
it is expected that whoever defines this method, calls reset in this case),
start a new sequence and action
will be ignored.
This method will also start a new sequence if called after the environment
has been constructed and reset()
has not been called. In this case
action
will be ignored.
Expected sequences look like:
time_step -> action -> next_time_step
The action should depend on the previous time_step for correctness.
Args |
action
|
A Tensor, or a nested dict, list or tuple of Tensors corresponding
to action_spec() .
|
Returns |
A TimeStep namedtuple containing:
step_type: A StepType value.
reward: Reward at this time_step.
discount: A discount in the range [0, 1].
observation: A Tensor, or a nested dict, list or tuple of Tensors
corresponding to observation_spec() .
|
time_step_spec
View source
time_step_spec()
Describes the TimeStep
specs of Tensors returned by step()
.
Returns |
A TimeStep namedtuple containing TensorSpec objects defining the
Tensors returned by step() , i.e.
(step_type, reward, discount, observation).
|
Except as otherwise noted, the content of this page is licensed under the Creative Commons Attribution 4.0 License, and code samples are licensed under the Apache 2.0 License. For details, see the Google Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.
Last updated 2024-04-26 UTC.
[[["Easy to understand","easyToUnderstand","thumb-up"],["Solved my problem","solvedMyProblem","thumb-up"],["Other","otherUp","thumb-up"]],[["Missing the information I need","missingTheInformationINeed","thumb-down"],["Too complicated / too many steps","tooComplicatedTooManySteps","thumb-down"],["Out of date","outOfDate","thumb-down"],["Samples / code issue","samplesCodeIssue","thumb-down"],["Other","otherDown","thumb-down"]],["Last updated 2024-04-26 UTC."],[],[],null,["# tf_agents.bandits.environments.drifting_linear_environment.DriftingLinearEnvironment\n\n\u003cbr /\u003e\n\n|----------------------------------------------------------------------------------------------------------------------------------------------------|\n| [View source on GitHub](https://github.com/tensorflow/agents/blob/v0.19.0/tf_agents/bandits/environments/drifting_linear_environment.py#L267-L301) |\n\nImplements a drifting linear environment.\n\nInherits From: [`NonStationaryStochasticEnvironment`](../../../../tf_agents/bandits/environments/non_stationary_stochastic_environment/NonStationaryStochasticEnvironment), [`BanditTFEnvironment`](../../../../tf_agents/bandits/environments/bandit_tf_environment/BanditTFEnvironment), [`TFEnvironment`](../../../../tf_agents/environments/TFEnvironment) \n\n tf_agents.bandits.environments.drifting_linear_environment.DriftingLinearEnvironment(\n observation_distribution: types.Distribution,\n observation_to_reward_distribution: types.Distribution,\n drift_distribution: types.Distribution,\n additive_reward_distribution: types.Distribution\n )\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n| Args ---- ||\n|--------------------------------------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|\n| `observation_distribution` | A distribution from [`tfp.distributions`](https://www.tensorflow.org/probability/api_docs/python/tfp/distributions) with shape `[batch_size, observation_dim]`. Note that the values of `batch_size` and `observation_dim` are deduced from the distribution. |\n| `observation_to_reward_distribution` | A distribution from [`tfp.distributions`](https://www.tensorflow.org/probability/api_docs/python/tfp/distributions) with shape `[observation_dim, num_actions]`. The value `observation_dim` must match the second dimension of `observation_distribution`. |\n| `drift_distribution` | A scalar distribution from [`tfp.distributions`](https://www.tensorflow.org/probability/api_docs/python/tfp/distributions) of type tf.float32. It represents the angle of rotation. |\n| `additive_reward_distribution` | A distribution from [`tfp.distributions`](https://www.tensorflow.org/probability/api_docs/python/tfp/distributions) with shape `[num_actions]`. This models the non-contextual behavior of the bandit. |\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n| Attributes ---------- ||\n|------------------------|---------------|\n| `batch_size` | \u003cbr /\u003e \u003cbr /\u003e |\n| `batched` | \u003cbr /\u003e \u003cbr /\u003e |\n| `environment_dynamics` | \u003cbr /\u003e \u003cbr /\u003e |\n| `name` | \u003cbr /\u003e \u003cbr /\u003e |\n\n\u003cbr /\u003e\n\nMethods\n-------\n\n### `action_spec`\n\n[View source](https://github.com/tensorflow/agents/blob/v0.19.0/tf_agents/environments/tf_environment.py#L146-L157) \n\n action_spec()\n\nDescribes the specs of the Tensors expected by `step(action)`.\n\n`action` can be a single Tensor, or a nested dict, list or tuple of\nTensors.\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n| Returns ||\n|---|---|\n| An single `TensorSpec`, or a nested dict, list or tuple of `TensorSpec` objects, which describe the shape and dtype of each Tensor expected by `step()`. ||\n\n\u003cbr /\u003e\n\n### `current_time_step`\n\n[View source](https://github.com/tensorflow/agents/blob/v0.19.0/tf_agents/environments/tf_environment.py#L185-L196) \n\n current_time_step()\n\nReturns the current `TimeStep`.\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n| Returns ||\n|---|---|\n| A `TimeStep` namedtuple containing: step_type: A `StepType` value. reward: Reward at this time_step. discount: A discount in the range \\[0, 1\\]. observation: A Tensor, or a nested dict, list or tuple of Tensors corresponding to `observation_spec()`. ||\n\n\u003cbr /\u003e\n\n### `observation_spec`\n\n[View source](https://github.com/tensorflow/agents/blob/v0.19.0/tf_agents/environments/tf_environment.py#L159-L166) \n\n observation_spec()\n\nDefines the `TensorSpec` of observations provided by the environment.\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n| Returns ||\n|---|---|\n| A `TensorSpec`, or a nested dict, list or tuple of `TensorSpec` objects, which describe the observation. ||\n\n\u003cbr /\u003e\n\n### `render`\n\n[View source](https://github.com/tensorflow/agents/blob/v0.19.0/tf_agents/environments/tf_environment.py#L243-L249) \n\n render()\n\nRenders a frame from the environment.\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n| Raises ||\n|-----------------------|------------------------------------------------|\n| `NotImplementedError` | If the environment does not support rendering. |\n\n\u003cbr /\u003e\n\n### `reset`\n\n[View source](https://github.com/tensorflow/agents/blob/v0.19.0/tf_agents/environments/tf_environment.py#L198-L209) \n\n reset()\n\nResets the environment and returns the current time_step.\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n| Returns ||\n|---|---|\n| A `TimeStep` namedtuple containing: step_type: A `StepType` value. reward: Reward at this time_step. discount: A discount in the range \\[0, 1\\]. observation: A Tensor, or a nested dict, list or tuple of Tensors corresponding to `observation_spec()`. ||\n\n\u003cbr /\u003e\n\n### `reward_spec`\n\n[View source](https://github.com/tensorflow/agents/blob/v0.19.0/tf_agents/environments/tf_environment.py#L168-L175) \n\n reward_spec()\n\nDefines the `TensorSpec` of rewards provided by the environment.\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n| Returns ||\n|---|---|\n| A `TensorSpec`, or a nested dict, list or tuple of `TensorSpec` objects, which describe the reward. ||\n\n\u003cbr /\u003e\n\n### `step`\n\n[View source](https://github.com/tensorflow/agents/blob/v0.19.0/tf_agents/environments/tf_environment.py#L211-L241) \n\n step(\n action\n )\n\nSteps the environment according to the action.\n\nIf the environment returned a `TimeStep` with [`StepType.LAST`](../../../../tf_agents/trajectories/StepType#LAST) at the\nprevious step, this call to `step` should reset the environment (note that\nit is expected that whoever defines this method, calls reset in this case),\nstart a new sequence and `action` will be ignored.\n\nThis method will also start a new sequence if called after the environment\nhas been constructed and `reset()` has not been called. In this case\n`action` will be ignored.\n\nExpected sequences look like:\n\ntime_step -\\\u003e action -\\\u003e next_time_step\n\nThe action should depend on the previous time_step for correctness.\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n| Args ||\n|----------|----------------------------------------------------------------------------------------|\n| `action` | A Tensor, or a nested dict, list or tuple of Tensors corresponding to `action_spec()`. |\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n| Returns ||\n|---|---|\n| A `TimeStep` namedtuple containing: step_type: A `StepType` value. reward: Reward at this time_step. discount: A discount in the range \\[0, 1\\]. observation: A Tensor, or a nested dict, list or tuple of Tensors corresponding to `observation_spec()`. ||\n\n\u003cbr /\u003e\n\n### `time_step_spec`\n\n[View source](https://github.com/tensorflow/agents/blob/v0.19.0/tf_agents/environments/tf_environment.py#L136-L144) \n\n time_step_spec()\n\nDescribes the `TimeStep` specs of Tensors returned by `step()`.\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n| Returns ||\n|---|---|\n| A `TimeStep` namedtuple containing `TensorSpec` objects defining the Tensors returned by `step()`, i.e. (step_type, reward, discount, observation). ||\n\n\u003cbr /\u003e"]]