tft.estimated_probability_density
Stay organized with collections
Save and categorize content based on your preferences.
Computes an approximate probability density at each x, given the bins.
tft.estimated_probability_density(
x: tf.Tensor,
boundaries: Optional[Union[tf.Tensor, int]] = None,
categorical: bool = False,
name: Optional[str] = None
) -> tf.Tensor
Using this type of fixed-interval method has several benefits compared to
bucketization, although may not always be preferred.
- Quantiles does not work on categorical data.
- The quantiles algorithm does not currently operate on multiple features
jointly, only independently.
Ex: Outlier detection in a multi-modal or arbitrary distribution.
Imagine a value x where a simple model is highly predictive of a target y
within certain densely populated ranges. Outside these ranges, we may want
to treat the data differently, but there are too few samples for the model
to detect them by case-by-case treatment.
One option would be to use the density estimate for this purpose:
outputs['x_density'] = tft.estimated_prob(inputs['x'], bins=100)
outputs['outlier_x'] = tf.where(outputs['x_density'] < OUTLIER_THRESHOLD,
tf.constant([1]), tf.constant([0]))
This exercise uses a single variable for illustration, but a direct density
metric would become more useful with higher dimensions.
Note that we normalize by average bin_width to arrive at a probability density
estimate. The result resembles a pdf, not the probability that a value falls
in the bucket (except in the categorical case).
Args |
x
|
A Tensor .
|
boundaries
|
(Optional) A Tensor or int used to approximate the density.
If possible provide boundaries as a Tensor of multiple sorted values.
Will default to 10 intervals over the 0-1 range, or find the min/max
if an int is provided (not recommended because multi-phase analysis is
inefficient). If the boundaries are known as potentially arbitrary
interval boundaries, sizes are assumed to be equal. If the sizes are
unequal, density may be inaccurate. Ignored if categorical is true.
|
categorical
|
(Optional) A bool that will treat x as categorical if true.
|
name
|
(Optional) A name for this operation.
|
Returns |
A Tensor the same shape as x, the probability density estimate at x (or
probability mass estimate if categorical is True).
|
Raises |
NotImplementedError
|
If x is CompositeTensor.
|
Except as otherwise noted, the content of this page is licensed under the Creative Commons Attribution 4.0 License, and code samples are licensed under the Apache 2.0 License. For details, see the Google Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.
Last updated 2024-11-01 UTC.
[[["Easy to understand","easyToUnderstand","thumb-up"],["Solved my problem","solvedMyProblem","thumb-up"],["Other","otherUp","thumb-up"]],[["Missing the information I need","missingTheInformationINeed","thumb-down"],["Too complicated / too many steps","tooComplicatedTooManySteps","thumb-down"],["Out of date","outOfDate","thumb-down"],["Samples / code issue","samplesCodeIssue","thumb-down"],["Other","otherDown","thumb-down"]],["Last updated 2024-11-01 UTC."],[],[],null,["# tft.estimated_probability_density\n\n\u003cbr /\u003e\n\n|---------------------------------------------------------------------------------------------------------------------------|\n| [View source on GitHub](https://github.com/tensorflow/transform/blob/v1.16.0/tensorflow_transform/mappers.py#L2188-L2279) |\n\nComputes an approximate probability density at each x, given the bins. \n\n tft.estimated_probability_density(\n x: tf.Tensor,\n boundaries: Optional[Union[tf.Tensor, int]] = None,\n categorical: bool = False,\n name: Optional[str] = None\n ) -\u003e tf.Tensor\n\nUsing this type of fixed-interval method has several benefits compared to\nbucketization, although may not always be preferred.\n\n1. Quantiles does not work on categorical data.\n2. The quantiles algorithm does not currently operate on multiple features jointly, only independently.\n\nEx: Outlier detection in a multi-modal or arbitrary distribution.\nImagine a value x where a simple model is highly predictive of a target y\nwithin certain densely populated ranges. Outside these ranges, we may want\nto treat the data differently, but there are too few samples for the model\nto detect them by case-by-case treatment.\nOne option would be to use the density estimate for this purpose:\n\noutputs\\['x_density'\\] = tft.estimated_prob(inputs\\['x'\\], bins=100)\noutputs\\['outlier_x'\\] = tf.where(outputs\\['x_density'\\] \\\u003c OUTLIER_THRESHOLD,\ntf.constant(\\[1\\]), tf.constant(\\[0\\]))\n\nThis exercise uses a single variable for illustration, but a direct density\nmetric would become more useful with higher dimensions.\n\nNote that we normalize by average bin_width to arrive at a probability density\nestimate. The result resembles a pdf, not the probability that a value falls\nin the bucket (except in the categorical case).\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n| Args ---- ||\n|---------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|\n| `x` | A `Tensor`. |\n| `boundaries` | (Optional) A `Tensor` or int used to approximate the density. If possible provide boundaries as a Tensor of multiple sorted values. Will default to 10 intervals over the 0-1 range, or find the min/max if an int is provided (not recommended because multi-phase analysis is inefficient). If the boundaries are known as potentially arbitrary interval boundaries, sizes are assumed to be equal. If the sizes are unequal, density may be inaccurate. Ignored if `categorical` is true. |\n| `categorical` | (Optional) A `bool` that will treat x as categorical if true. |\n| `name` | (Optional) A name for this operation. |\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n| Returns ------- ||\n|---|---|\n| A `Tensor` the same shape as x, the probability density estimate at x (or probability mass estimate if `categorical` is True). ||\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n| Raises ------ ||\n|-----------------------|----------------------------|\n| `NotImplementedError` | If `x` is CompositeTensor. |\n\n\u003cbr /\u003e"]]