tfp.vi.mutual_information.lower_bound_info_nce
Stay organized with collections
Save and categorize content based on your preferences.
InfoNCE lower bound on mutual information.
tfp.vi.mutual_information.lower_bound_info_nce(
logu, joint_sample_mask=None, validate_args=False, name=None
)
InfoNCE lower bound is proposed in [van den Oord et al. (2018)][1]
based on noise contrastive estimation (NCE).
I(X; Y) >= 1/K sum(i=1:K, log( p_joint[i] / p_marginal[i])),
where the numerator and the denominator are, respectively,
p_joint[i] = p(x[i] | y[i]) = exp( f(x[i], y[i]) ),
p_marginal[i] = 1/K sum(j=1:K, p(x[i] | y[j]) )
= 1/K sum(j=1:K, exp( f(x[i], y[j]) ) ),
and (x[i], y[i]), i=1:K
are samples from joint distribution p(x, y)
.
Pairs of points (x, y) are scored using a critic function f
.
Example:
X
, Y
are samples from a joint Gaussian distribution, with
correlation 0.8
and both of dimension 1
.
batch_size, rho, dim = 10000, 0.8, 1
y, eps = tf.split(
value=tf.random.normal(shape=(2 * batch_size, dim), seed=7),
num_or_size_splits=2, axis=0)
mean, conditional_stddev = rho * y, tf.sqrt(1. - tf.square(rho))
x = mean + conditional_stddev * eps
# Conditional distribution of p(x|y)
conditional_dist = tfd.MultivariateNormalDiag(
mean, scale_diag=conditional_stddev * tf.ones((batch_size, dim)))
# Scores/unnormalized likelihood of pairs of samples `x[i], y[j]`
# (The scores has its shape [x_batch_size, distibution_batch_size]
# as the `lower_bound_info_nce` requires `scores[i, j] = f(x[i], y[j])
# = log p(x[i] | y[j])`.)
scores = conditional_dist.log_prob(x[:, tf.newaxis, :])
# Mask for joint samples
joint_sample_mask = tf.eye(batch_size, dtype=bool)
# InfoNCE lower bound on mutual information
lower_bound_info_nce(logu=scores, joint_sample_mask=joint_sample_mask)
Args |
logu
|
float -like Tensor of size [batch_size_1, batch_size_2]
representing critic scores (scores) for pairs of points (x, y) with
logu[i, j] = f(x[i], y[j]) .
|
joint_sample_mask
|
bool -like Tensor of the same size as logu
masking the positive samples by True , i.e. samples from joint
distribution p(x, y) .
Default value: None . By default, an identity matrix is constructed as
the mask.
|
validate_args
|
Python bool , default False . Whether to validate input
with asserts. If validate_args is False , and the inputs are invalid,
correct behavior is not guaranteed.
|
name
|
Python str name prefixed to Ops created by this function.
Default value: None (i.e., 'lower_bound_info_nce').
|
Returns |
lower_bound
|
float -like scalar for lower bound on mutual information.
|
References
[1]: Aaron van den Oord, Yazhe Li, Oriol Vinyals. Representation
Learning with Contrastive Predictive Coding. arXiv preprint
arXiv:1807.03748, 2018. https://arxiv.org/abs/1807.03748
Except as otherwise noted, the content of this page is licensed under the Creative Commons Attribution 4.0 License, and code samples are licensed under the Apache 2.0 License. For details, see the Google Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.
Last updated 2023-11-21 UTC.
[[["Easy to understand","easyToUnderstand","thumb-up"],["Solved my problem","solvedMyProblem","thumb-up"],["Other","otherUp","thumb-up"]],[["Missing the information I need","missingTheInformationINeed","thumb-down"],["Too complicated / too many steps","tooComplicatedTooManySteps","thumb-down"],["Out of date","outOfDate","thumb-down"],["Samples / code issue","samplesCodeIssue","thumb-down"],["Other","otherDown","thumb-down"]],["Last updated 2023-11-21 UTC."],[],[],null,["# tfp.vi.mutual_information.lower_bound_info_nce\n\n\u003cbr /\u003e\n\n|--------------------------------------------------------------------------------------------------------------------------------------------------|\n| [View source on GitHub](https://github.com/tensorflow/probability/blob/v0.23.0/tensorflow_probability/python/vi/mutual_information.py#L240-L342) |\n\nInfoNCE lower bound on mutual information. \n\n tfp.vi.mutual_information.lower_bound_info_nce(\n logu, joint_sample_mask=None, validate_args=False, name=None\n )\n\nInfoNCE lower bound is proposed in \\[van den Oord et al. (2018)\\]\\[1\\]\nbased on noise contrastive estimation (NCE). \n\n I(X; Y) \u003e= 1/K sum(i=1:K, log( p_joint[i] / p_marginal[i])),\n\nwhere the numerator and the denominator are, respectively, \n\n p_joint[i] = p(x[i] | y[i]) = exp( f(x[i], y[i]) ),\n p_marginal[i] = 1/K sum(j=1:K, p(x[i] | y[j]) )\n = 1/K sum(j=1:K, exp( f(x[i], y[j]) ) ),\n\nand `(x[i], y[i]), i=1:K` are samples from joint distribution `p(x, y)`.\nPairs of points (x, y) are scored using a critic function `f`.\n\n#### Example:\n\n`X`, `Y` are samples from a joint Gaussian distribution, with\ncorrelation `0.8` and both of dimension `1`. \n\n batch_size, rho, dim = 10000, 0.8, 1\n y, eps = tf.split(\n value=tf.random.normal(shape=(2 * batch_size, dim), seed=7),\n num_or_size_splits=2, axis=0)\n mean, conditional_stddev = rho * y, tf.sqrt(1. - tf.square(rho))\n x = mean + conditional_stddev * eps\n\n # Conditional distribution of p(x|y)\n conditional_dist = tfd.MultivariateNormalDiag(\n mean, scale_diag=conditional_stddev * tf.ones((batch_size, dim)))\n\n # Scores/unnormalized likelihood of pairs of samples `x[i], y[j]`\n # (The scores has its shape [x_batch_size, distibution_batch_size]\n # as the `lower_bound_info_nce` requires `scores[i, j] = f(x[i], y[j])\n # = log p(x[i] | y[j])`.)\n scores = conditional_dist.log_prob(x[:, tf.newaxis, :])\n\n # Mask for joint samples\n joint_sample_mask = tf.eye(batch_size, dtype=bool)\n\n # InfoNCE lower bound on mutual information\n lower_bound_info_nce(logu=scores, joint_sample_mask=joint_sample_mask)\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n| Args ---- ||\n|---------------------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|\n| `logu` | `float`-like `Tensor` of size `[batch_size_1, batch_size_2]` representing critic scores (scores) for pairs of points (x, y) with `logu[i, j] = f(x[i], y[j])`. |\n| `joint_sample_mask` | `bool`-like `Tensor` of the same size as `logu` masking the positive samples by `True`, i.e. samples from joint distribution `p(x, y)`. Default value: `None`. By default, an identity matrix is constructed as the mask. |\n| `validate_args` | Python `bool`, default `False`. Whether to validate input with asserts. If `validate_args` is `False`, and the inputs are invalid, correct behavior is not guaranteed. |\n| `name` | Python `str` name prefixed to Ops created by this function. Default value: `None` (i.e., 'lower_bound_info_nce'). |\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n| Returns ------- ||\n|---------------|--------------------------------------------------------------|\n| `lower_bound` | `float`-like `scalar` for lower bound on mutual information. |\n\n\u003cbr /\u003e\n\n#### References\n\n\\[1\\]: Aaron van den Oord, Yazhe Li, Oriol Vinyals. Representation\nLearning with Contrastive Predictive Coding. *arXiv preprint\narXiv:1807.03748* , 2018. \u003chttps://arxiv.org/abs/1807.03748\u003e"]]