BigQueryClient is the entrypoint for interacting with Cloud BigQuery in TF.
tfio.bigquery.BigQueryClient()
Used in the notebooks
BigQueryClient encapsulates a connection to Cloud BigQuery, and exposes the
readSession
method to initiate a BigQuery read session.
Child Classes
class DataFormat
class FieldMode
Methods
read_session
View source
read_session(
parent,
project_id,
table_id,
dataset_id,
selected_fields,
output_types=None,
default_values=None,
row_restriction='',
requested_streams=1,
data_format: tfio.bigquery.BigQueryClient.DataFormat
= tfio.bigquery.BigQueryClient.DataFormat.AVRO
)
Opens a session and returns a BigQueryReadSession
object.
Args |
parent
|
String of the form projects/{project_id} indicating the project
this ReadSession is associated with. This is the project that will be
billed for usage.
|
project_id
|
The assigned project ID of the project.
|
table_id
|
The ID of the table in the dataset.
|
dataset_id
|
The ID of the dataset in the project.
|
selected_fields
|
This can be a list or a dict. If a list, it has
names of the fields in the table that should be read. If a dict,
it should be in a form like, i.e:
{ "field_a_name": {"mode": "repeated", "output_type": dtypes.int64},
"field_b_name": {"mode": "nullable", "output_type": dtypes.int32, "default_value": 0},
...
"field_x_name": {"mode": "repeated", "output_type": dtypes.string, "default_value": ""}
}
"mode" is BigQuery column attribute, it can be 'repeated', 'nullable' or 'required'.
The output field order is unrelated to the order of fields in
selected_fields. If "mode" not specified, defaults to "nullable".
If "output_type" not specified, DT_STRING is implied for all Tensors.
|
output_types
|
Types for the output tensor in the same sequence as
selected_fields. This is only needed when selected_fields is a list,
if selected_fields is a dictionary, this output_types information is
included in selected_fields as described above.
If not specified, DT_STRING is implied for all Tensors.
|
default_values
|
Default values to use when underlying tensor is "null"
in the same sequence as selected_fields. If not sepecified,
meaningful defaults are going to be used
(0 for numerices, empty string for strings, and False for booleans).
|
row_restriction
|
Optional. SQL text filtering statement, similar to a
WHERE clause in a query.
|
requested_streams
|
Desirable number of streams that can be read in parallel.
Must be a positive number. The actual number of streams that
BigQuery Streaming API returns may be lower than this number,
depending on the amount parallelism that is reasonable
for the table and the maximum amount of parallelism allowed by the
system.
|
Returns |
A BigQueryReadSession Python object representing the
operations available on the table.
|