Module: tf_agents.policies.utils
Stay organized with collections
Save and categorize content based on your preferences.
Utilities for policies.
Classes
class BanditPolicyType
: Enumeration of bandit policy types.
class InfoFields
: Strings which can be used in the policy info fields.
class PerArmPolicyInfo
: PerArmPolicyInfo(log_probability, predicted_rewards_mean, multiobjective_scalarized_predicted_rewards_mean, predicted_rewards_optimistic, predicted_rewards_sampled, bandit_policy_type, chosen_arm_features)
class PolicyInfo
: PolicyInfo(log_probability, predicted_rewards_mean, multiobjective_scalarized_predicted_rewards_mean, predicted_rewards_optimistic, predicted_rewards_sampled, bandit_policy_type)
Functions
bandit_policy_uniform_mask(...)
: Set bandit policy type tensor to BanditPolicyType.UNIFORM based on mask.
check_no_mask_with_arm_features(...)
create_bandit_policy_type_tensor_spec(...)
: Create tensor spec for bandit policy type.
create_chosen_arm_features_info_spec(...)
: Creates the chosen arm features info spec from the arm observation spec.
get_model_index(...)
: Returns the model index for a specific arm.
get_num_actions_from_tensor_spec(...)
: Validates action_spec
and returns number of actions.
has_bandit_policy_type(...)
: Check if policy info has bandit_policy_type
field/tensor.
has_chosen_arm_features(...)
: Check if policy info has chosen_arm_features
field/tensor.
masked_argmax(...)
: Computes the argmax where the allowed elements are given by a mask.
populate_policy_info(...)
: Populates policy info given all needed input.
set_bandit_policy_type(...)
: Sets the InfoFields.BANDIT_POLICY_TYPE on info to bandit_policy_type.
Other Members |
absolute_import
|
Instance of __future__._Feature
|
division
|
Instance of __future__._Feature
|
print_function
|
Instance of __future__._Feature
|
Except as otherwise noted, the content of this page is licensed under the Creative Commons Attribution 4.0 License, and code samples are licensed under the Apache 2.0 License. For details, see the Google Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.
Last updated 2024-04-26 UTC.
[[["Easy to understand","easyToUnderstand","thumb-up"],["Solved my problem","solvedMyProblem","thumb-up"],["Other","otherUp","thumb-up"]],[["Missing the information I need","missingTheInformationINeed","thumb-down"],["Too complicated / too many steps","tooComplicatedTooManySteps","thumb-down"],["Out of date","outOfDate","thumb-down"],["Samples / code issue","samplesCodeIssue","thumb-down"],["Other","otherDown","thumb-down"]],["Last updated 2024-04-26 UTC."],[],[],null,["# Module: tf_agents.policies.utils\n\n\u003cbr /\u003e\n\n|--------------------------------------------------------------------------------------------------------|\n| [View source on GitHub](https://github.com/tensorflow/agents/blob/v0.19.0/tf_agents/policies/utils.py) |\n\nUtilities for policies.\n\nClasses\n-------\n\n[`class BanditPolicyType`](../../tf_agents/policies/utils/BanditPolicyType): Enumeration of bandit policy types.\n\n[`class InfoFields`](../../tf_agents/policies/utils/InfoFields): Strings which can be used in the policy info fields.\n\n[`class PerArmPolicyInfo`](../../tf_agents/policies/utils/PerArmPolicyInfo): PerArmPolicyInfo(log_probability, predicted_rewards_mean, multiobjective_scalarized_predicted_rewards_mean, predicted_rewards_optimistic, predicted_rewards_sampled, bandit_policy_type, chosen_arm_features)\n\n[`class PolicyInfo`](../../tf_agents/policies/utils/PolicyInfo): PolicyInfo(log_probability, predicted_rewards_mean, multiobjective_scalarized_predicted_rewards_mean, predicted_rewards_optimistic, predicted_rewards_sampled, bandit_policy_type)\n\nFunctions\n---------\n\n[`bandit_policy_uniform_mask(...)`](../../tf_agents/policies/utils/bandit_policy_uniform_mask): Set bandit policy type tensor to BanditPolicyType.UNIFORM based on mask.\n\n[`check_no_mask_with_arm_features(...)`](../../tf_agents/policies/utils/check_no_mask_with_arm_features)\n\n[`create_bandit_policy_type_tensor_spec(...)`](../../tf_agents/policies/utils/create_bandit_policy_type_tensor_spec): Create tensor spec for bandit policy type.\n\n[`create_chosen_arm_features_info_spec(...)`](../../tf_agents/policies/utils/create_chosen_arm_features_info_spec): Creates the chosen arm features info spec from the arm observation spec.\n\n[`get_model_index(...)`](../../tf_agents/policies/utils/get_model_index): Returns the model index for a specific arm.\n\n[`get_num_actions_from_tensor_spec(...)`](../../tf_agents/policies/utils/get_num_actions_from_tensor_spec): Validates `action_spec` and returns number of actions.\n\n[`has_bandit_policy_type(...)`](../../tf_agents/policies/utils/has_bandit_policy_type): Check if policy info has `bandit_policy_type` field/tensor.\n\n[`has_chosen_arm_features(...)`](../../tf_agents/policies/utils/has_chosen_arm_features): Check if policy info has `chosen_arm_features` field/tensor.\n\n[`masked_argmax(...)`](../../tf_agents/policies/utils/masked_argmax): Computes the argmax where the allowed elements are given by a mask.\n\n[`populate_policy_info(...)`](../../tf_agents/policies/utils/populate_policy_info): Populates policy info given all needed input.\n\n[`set_bandit_policy_type(...)`](../../tf_agents/policies/utils/set_bandit_policy_type): Sets the InfoFields.BANDIT_POLICY_TYPE on info to bandit_policy_type.\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n| Other Members ------------- ||\n|-----------------|-----------------------------------|\n| absolute_import | Instance of `__future__._Feature` |\n| division | Instance of `__future__._Feature` |\n| print_function | Instance of `__future__._Feature` |\n\n\u003cbr /\u003e"]]