A TFX component to transform the input examples.
Inherits From: BaseBeamComponent
, BaseComponent
, BaseNode
tfx.v1.components.Transform(
examples: tfx.v1.types.BaseChannel
,
schema: tfx.v1.types.BaseChannel
,
module_file: Optional[Union[str, tfx.v1.dsl.experimental.RuntimeParameter
]] = None,
preprocessing_fn: Optional[Union[str, tfx.v1.dsl.experimental.RuntimeParameter
]] = None,
splits_config: Optional[tfx.v1.proto.SplitsConfig
] = None,
analyzer_cache: Optional[tfx.v1.types.BaseChannel
] = None,
materialize: bool = True,
disable_analyzer_cache: bool = False,
force_tf_compat_v1: bool = False,
custom_config: Optional[Dict[str, Any]] = None,
disable_statistics: bool = False,
stats_options_updater_fn: Optional[str] = None
)
Used in the notebooks
Used in the tutorials |
---|
The Transform component wraps TensorFlow Transform (tf.Transform) to
preprocess data in a TFX pipeline. This component will load the
preprocessing_fn from input module file, preprocess both 'train' and 'eval'
splits of input examples, generate the tf.Transform
output, and save both
transform function and transformed examples to orchestrator desired locations.
The Transform component can also invoke TFDV to compute statistics on the
pre-transform and post-transform data. Invocations of TFDV take an optional
StatsOptions
object. To configure the StatsOptions object that is passed to TFDV for both
pre-transform and post-transform statistics, users
can define the optional stats_options_updater_fn
within the module file.
Providing a preprocessing function
The TFX executor will use the estimator provided in the module_file
file
to train the model. The Transform executor will look specifically for the
preprocessing_fn()
function within that file.
An example of preprocessing_fn()
can be found in the user-supplied
code
of the TFX Chicago Taxi pipeline example.
Updating StatsOptions
The Transform executor will look specifically for the
stats_options_updater_fn()
within the module file specified above.
An example of stats_options_updater_fn()
can be found in the user-supplied
code
of the TFX BERT MRPC pipeline example.
Example
# Performs transformations and feature engineering in training and serving.
transform = Transform(
examples=example_gen.outputs['examples'],
schema=infer_schema.outputs['schema'],
module_file=module_file)
Component outputs
contains:
transform_graph
: Channel of typestandard_artifacts.TransformGraph
, which includes an exported Tensorflow graph suitable for both training and serving.transformed_examples
: Channel of typestandard_artifacts.Examples
for materialized transformed examples, which includes transform splits as specified in splits_config. This is optional controlled bymaterialize
.
Please see the Transform guide for more details.
Args | |
---|---|
examples
|
A BaseChannel of type standard_artifacts.Examples (required).
This should contain custom splits specified in splits_config. If custom
split is not provided, this should contain two splits 'train' and
'eval'.
|
schema
|
A BaseChannel of type standard_artifacts.Schema . This should
contain a single schema artifact.
|
module_file
|
The file path to a python module file, from which the
'preprocessing_fn' function will be loaded.
Exactly one of 'module_file' or 'preprocessing_fn' must be supplied.
The function needs to have the following signature:
where the values of input and returned Dict are either tf.Tensor or tf.SparseTensor. If additional inputs are needed for preprocessing_fn, they can be passed in custom_config:
To update the stats options used to compute the pre-transform or post-transform statistics, optionally define the 'stats-options_updater_fn' within the same module. If implemented, this function needs to have the following signature:
Use of a RuntimeParameter for this argument is experimental. |
preprocessing_fn
|
The path to python function that implements a 'preprocessing_fn'. See 'module_file' for expected signature of the function. Exactly one of 'module_file' or 'preprocessing_fn' must be supplied. Use of a RuntimeParameter for this argument is experimental. |
splits_config
|
A transform_pb2.SplitsConfig instance, providing splits that should be analyzed and splits that should be transformed. Note analyze and transform splits can have overlap. Default behavior (when splits_config is not set) is analyze the 'train' split and transform all splits. If splits_config is set, analyze cannot be empty. |
analyzer_cache
|
Optional input 'TransformCache' channel containing cached information from previous Transform runs. When provided, Transform will try use the cached calculation if possible. |
materialize
|
If True, write transformed examples as an output. |
disable_analyzer_cache
|
If False, Transform will use input cache if
provided and write cache output. If True, analyzer_cache must not be
provided.
|
force_tf_compat_v1
|
(Optional) If True and/or TF2 behaviors are disabled
Transform will use Tensorflow in compat.v1 mode irrespective of
installed version of Tensorflow. Defaults to False .
|
custom_config
|
A dict which contains additional parameters that will be passed to preprocessing_fn. |
disable_statistics
|
If True, do not invoke TFDV to compute pre-transform
and post-transform statistics. When statistics are computed, they will
will be stored in the pre_transform_feature_stats/ and
post_transform_feature_stats/ subfolders of the transform_graph
export.
|
stats_options_updater_fn
|
The path to a python function that implements a 'stats_options_updater_fn'. See 'module_file' for expected signature of the function. 'stats_options_updater_fn' cannot be defined if 'module_file' is specified. |
Raises | |
---|---|
ValueError
|
When both or neither of 'module_file' and 'preprocessing_fn' is supplied. |
Attributes | |
---|---|
outputs
|
Component's output channel dict. |
Methods
with_beam_pipeline_args
with_beam_pipeline_args(
beam_pipeline_args: Iterable[Union[str, placeholder.Placeholder]]
) -> 'BaseBeamComponent'
Add per component Beam pipeline args.
Args | |
---|---|
beam_pipeline_args
|
List of Beam pipeline args to be added to the Beam executor spec. |
Returns | |
---|---|
the same component itself. |
with_node_execution_options
with_node_execution_options(
node_execution_options: utils.NodeExecutionOptions
) -> typing_extensions.Self
Class Variables | |
---|---|
POST_EXECUTABLE_SPEC |
None
|
PRE_EXECUTABLE_SPEC |
None
|