tfdv.generate_statistics_from_tfrecord
Stay organized with collections
Save and categorize content based on your preferences.
Compute data statistics from TFRecord files containing TFExamples.
tfdv.generate_statistics_from_tfrecord(
data_location: Text,
output_path: Optional[bytes] = None,
stats_options: tfdv.StatsOptions
= options.StatsOptions()
,
pipeline_options: Optional[PipelineOptions] = None
) -> statistics_pb2.DatasetFeatureStatisticsList
Used in the notebooks
Runs a Beam pipeline to compute the data statistics and return the result
data statistics proto.
This is a convenience method for users with data in TFRecord format.
Users with data in unsupported file/data formats, or users who wish
to create their own Beam pipelines need to use the 'GenerateStatistics'
PTransform API directly instead.
Args |
data_location
|
The location of the input data files.
|
output_path
|
The file path to output data statistics result to. If None, we
use a temporary directory. It will be a TFRecord file containing a single
data statistics proto, and can be read with the 'load_statistics' API.
If you run this function on Google Cloud, you must specify an
output_path. Specifying None may cause an error.
|
stats_options
|
tfdv.StatsOptions for generating data statistics.
|
pipeline_options
|
Optional beam pipeline options. This allows users to
specify various beam pipeline execution parameters like pipeline runner
(DirectRunner or DataflowRunner), cloud dataflow service project id, etc.
See https://cloud.google.com/dataflow/pipelines/specifying-exec-params for
more details.
|
Returns |
A DatasetFeatureStatisticsList proto.
|
Except as otherwise noted, the content of this page is licensed under the Creative Commons Attribution 4.0 License, and code samples are licensed under the Apache 2.0 License. For details, see the Google Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.
Last updated 2024-10-18 UTC.
[[["Easy to understand","easyToUnderstand","thumb-up"],["Solved my problem","solvedMyProblem","thumb-up"],["Other","otherUp","thumb-up"]],[["Missing the information I need","missingTheInformationINeed","thumb-down"],["Too complicated / too many steps","tooComplicatedTooManySteps","thumb-down"],["Out of date","outOfDate","thumb-down"],["Samples / code issue","samplesCodeIssue","thumb-down"],["Other","otherDown","thumb-down"]],["Last updated 2024-10-18 UTC."],[],[]]