Module: tf_agents.bandits.environments.ranking_environment
Stay organized with collections
Save and categorize content based on your preferences.
Ranking Python Bandit environment with items as per-arm features.
The observations are drawn with the help of the arguments global_sampling_fn
and item_sampling_fn
.
The user is modeled the following way: the score of an item is calculated as a
weighted inner product of the global feature and the item feature. These scores
for all elements of a recommendation are treated as unnormalized logits for a
categorical distribution.
To model diversity and no-click, one can choose one from the following options:
--Do the following trick: every action (a list of recommended items) gets
item_dim
many extra "ghost actions", represented with unit vectors as item
features. If, based on inner products and all the items in the
recommendation, one of these ghost items is chosen by the environment's user
model, it means there was no suitable candidate in the neighborhood
, and
thus it means that the user did not click on any of the real items. This
somewhat relates to diversity, as if the item feature space had been covered
better, the ghost items would have been selected with very low probability.
--Calculate the scores of all items, and if none of them exceeds a given
threshold, no item is selected by the user.
Classes
class ClickModel
: Enumeration of user click models.
class ExplicitPositionalBiasRankingEnvironment
: A ranking environment in which one can explicitly set positional bias.
class FeedbackModel
: Enumeration of feedback models.
class RankingPyEnvironment
: Stationary Stochastic Bandit environment with per-arm features.
Other Members |
GLOBAL_KEY
|
'global'
|
PER_ARM_KEY
|
'per_arm'
|
Except as otherwise noted, the content of this page is licensed under the Creative Commons Attribution 4.0 License, and code samples are licensed under the Apache 2.0 License. For details, see the Google Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.
Last updated 2024-04-26 UTC.
[[["Easy to understand","easyToUnderstand","thumb-up"],["Solved my problem","solvedMyProblem","thumb-up"],["Other","otherUp","thumb-up"]],[["Missing the information I need","missingTheInformationINeed","thumb-down"],["Too complicated / too many steps","tooComplicatedTooManySteps","thumb-down"],["Out of date","outOfDate","thumb-down"],["Samples / code issue","samplesCodeIssue","thumb-down"],["Other","otherDown","thumb-down"]],["Last updated 2024-04-26 UTC."],[],[],null,["# Module: tf_agents.bandits.environments.ranking_environment\n\n\u003cbr /\u003e\n\n|----------------------------------------------------------------------------------------------------------------------------------|\n| [View source on GitHub](https://github.com/tensorflow/agents/blob/v0.19.0/tf_agents/bandits/environments/ranking_environment.py) |\n\nRanking Python Bandit environment with items as per-arm features.\n\nThe observations are drawn with the help of the arguments `global_sampling_fn`\nand `item_sampling_fn`.\n\nThe user is modeled the following way: the score of an item is calculated as a\nweighted inner product of the global feature and the item feature. These scores\nfor all elements of a recommendation are treated as unnormalized logits for a\ncategorical distribution.\n\nTo model diversity and no-click, one can choose one from the following options:\n--Do the following trick: every action (a list of recommended items) gets\n`item_dim` many extra \"ghost actions\", represented with unit vectors as item\nfeatures. If, based on inner products and all the items in the\nrecommendation, one of these ghost items is chosen by the environment's user\nmodel, it means there was no suitable candidate `in the neighborhood`, and\nthus it means that the user did not click on any of the real items. This\nsomewhat relates to diversity, as if the item feature space had been covered\nbetter, the ghost items would have been selected with very low probability.\n--Calculate the scores of all items, and if none of them exceeds a given\nthreshold, no item is selected by the user.\n\nClasses\n-------\n\n[`class ClickModel`](../../../tf_agents/bandits/environments/ranking_environment/ClickModel): Enumeration of user click models.\n\n[`class ExplicitPositionalBiasRankingEnvironment`](../../../tf_agents/bandits/environments/ranking_environment/ExplicitPositionalBiasRankingEnvironment): A ranking environment in which one can explicitly set positional bias.\n\n[`class FeedbackModel`](../../../tf_agents/bandits/environments/ranking_environment/FeedbackModel): Enumeration of feedback models.\n\n[`class RankingPyEnvironment`](../../../tf_agents/bandits/environments/ranking_environment/RankingPyEnvironment): Stationary Stochastic Bandit environment with per-arm features.\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n| Other Members ------------- ||\n|-------------|-------------|\n| GLOBAL_KEY | `'global'` |\n| PER_ARM_KEY | `'per_arm'` |\n\n\u003cbr /\u003e"]]