Validates TFExamples in TFRecord files.
tfdv.validate_examples_in_tfrecord(
data_location: Text,
stats_options: tfdv.StatsOptions
,
output_path: Optional[Text] = None,
pipeline_options: Optional[PipelineOptions] = None,
num_sampled_examples=0
) -> Union[statistics_pb2.DatasetFeatureStatisticsList, Tuple[statistics_pb2.
DatasetFeatureStatisticsList, Mapping[str, List[tf.train.Example]]]]
Runs a Beam pipeline to detect anomalies on a per-example basis. If this
function detects anomalous examples, it generates summary statistics regarding
the set of examples that exhibit each anomaly.
This is a convenience function for users with data in TFRecord format.
Users with data in unsupported file/data formats, or users who wish
to create their own Beam pipelines need to use the 'IdentifyAnomalousExamples'
PTransform API directly instead.
Args |
data_location
|
The location of the input data files.
|
stats_options
|
tfdv.StatsOptions for generating data statistics. This must
contain a schema.
|
output_path
|
The file path to output data statistics result to. If None, the
function uses a temporary directory. The output will be a TFRecord file
containing a single data statistics list proto, and can be read with the
'load_statistics' function.
If you run this function on Google Cloud, you must specify an
output_path. Specifying None may cause an error.
|
pipeline_options
|
Optional beam pipeline options. This allows users to
specify various beam pipeline execution parameters like pipeline runner
(DirectRunner or DataflowRunner), cloud dataflow service project id, etc.
See https://cloud.google.com/dataflow/pipelines/specifying-exec-params for
more details.
|
num_sampled_examples
|
If set, returns up to this many examples
of each anomaly type as a map from anomaly reason string to a list of
tf.Examples.
|
Returns |
If num_sampled_examples is zero, returns a single
DatasetFeatureStatisticsList proto in which each dataset consists of the
set of examples that exhibit a particular anomaly. If
num_sampled_examples is nonzero, returns the same statistics
proto as well as a mapping from anomaly to a list of tf.Examples that
exhibited that anomaly.
|
Raises |
ValueError
|
If the specified stats_options does not include a schema.
|