New! Use Simple ML for Sheets to apply machine learning to the data in your Google Sheets
Read More
tfdf.keras.pd_dataframe_to_tf_dataset
Stay organized with collections
Save and categorize content based on your preferences.
Converts a Panda Dataframe into a TF Dataset compatible with Keras.
tfdf.keras.pd_dataframe_to_tf_dataset(
dataframe,
label: Optional[str] = None,
task: Optional[TaskType] = Task.CLASSIFICATION,
max_num_classes: Optional[int] = 100,
in_place: Optional[bool] = False,
fix_feature_names: Optional[bool] = True,
weight: Optional[str] = None,
batch_size: Optional[int] = 1000
) -> tf.data.Dataset
Used in the notebooks
Used in the guide |
Used in the tutorials |
|
|
Details |
- Ensures columns have uniform types.
- If "label" is provided, separate it as a second channel in the tf.Dataset
(as expected by Keras).
- If "weight" is provided, separate it as a third channel in the tf.Dataset
(as expected by Keras).
- If "task" is provided, ensure the correct dtype of the label. If the task
is a classification and the label is a string, integerize the labels. In
this
case, the label values are extracted from the dataset and ordered
lexicographically. Warning: This logic won't work as expected if the
training and testing dataset contain different label values. In such
case, it is preferable to convert the label to integers beforehand while
making sure the same encoding is used for all the datasets.
- Returns "tf.data.from_tensor_slices"
|
Args |
dataframe
|
Pandas dataframe containing a training or evaluation dataset.
|
label
|
Name of the label column.
|
task
|
Target task of the dataset.
|
max_num_classes
|
Maximum number of classes for a classification task. A high
number of unique value / classes might indicate that the problem is a
regression or a ranking instead of a classification. Set to None to
disable checking the number of classes.
|
in_place
|
If false (default), the input dataframe will not be modified by
pd_dataframe_to_tf_dataset . However, a copy of the dataset memory will
be made. If true, the dataframe will be modified in-place.
|
fix_feature_names
|
Some feature names are not supported by the SavedModel
signature. If fix_feature_names=True (default) the feature will be
renamed and made compatible. If fix_feature_names=False , the feature
name will not be changed, but exporting the model might fail (i.e.
model.save(...) ).
|
weight
|
Optional name of a column in dataframe to use to weight the
training.
|
batch_size
|
Number of examples in each batch. The size of the batches has no
impact on the TF-DF training algorithms. However, a small batch size can
lead to a large overhead when loading the dataset. Defaults to 1000, but
if batch_size is set to None , no batching is applied. Note: TF-DF
expects for the dataset to be batched.
|
Returns |
A TensorFlow Dataset.
|
Except as otherwise noted, the content of this page is licensed under the Creative Commons Attribution 4.0 License, and code samples are licensed under the Apache 2.0 License. For details, see the Google Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.
Last updated 2024-04-26 UTC.
[[["Easy to understand","easyToUnderstand","thumb-up"],["Solved my problem","solvedMyProblem","thumb-up"],["Other","otherUp","thumb-up"]],[["Missing the information I need","missingTheInformationINeed","thumb-down"],["Too complicated / too many steps","tooComplicatedTooManySteps","thumb-down"],["Out of date","outOfDate","thumb-down"],["Samples / code issue","samplesCodeIssue","thumb-down"],["Other","otherDown","thumb-down"]],["Last updated 2024-04-26 UTC."],[],[],null,["# tfdf.keras.pd_dataframe_to_tf_dataset\n\n\u003cbr /\u003e\n\n|---------------------------------------------------------------------------------------------------------------------------------------------------|\n| [View source on GitHub](https://github.com/tensorflow/decision-forests/blob/main/tensorflow_decision_forests/keras/core_inference.py#L1034-L1194) |\n\nConverts a Panda Dataframe into a TF Dataset compatible with Keras.\n\n#### View aliases\n\n\n**Main aliases**\n\n[`tfdf.keras.core.pd_dataframe_to_tf_dataset`](https://www.tensorflow.org/decision_forests/api_docs/python/tfdf/keras/pd_dataframe_to_tf_dataset)\n\n\u003cbr /\u003e\n\n tfdf.keras.pd_dataframe_to_tf_dataset(\n dataframe,\n label: Optional[str] = None,\n task: Optional[TaskType] = Task.CLASSIFICATION,\n max_num_classes: Optional[int] = 100,\n in_place: Optional[bool] = False,\n fix_feature_names: Optional[bool] = True,\n weight: Optional[str] = None,\n batch_size: Optional[int] = 1000\n ) -\u003e tf.data.Dataset\n\n### Used in the notebooks\n\n| Used in the guide | Used in the tutorials |\n|-------------------------------------------------------------------------------------------------------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|\n| - [Migration examples: Canned Estimators](https://www.tensorflow.org/guide/migrate/canned_estimators) | - [Automated hyper-parameter tuning](https://www.tensorflow.org/decision_forests/tutorials/automatic_tuning_colab) - [Getting started](https://www.tensorflow.org/decision_forests/tutorials/beginner_colab) - [Visualizing TensorFlow Decision Forest Trees with dtreeviz](https://www.tensorflow.org/decision_forests/tutorials/dtreeviz_colab) - [Making predictions](https://www.tensorflow.org/decision_forests/tutorials/predict_colab) - [Using text and neural network features](https://www.tensorflow.org/decision_forests/tutorials/intermediate_colab) |\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n| Details ------- ||\n|---|---|\n| \u003cbr /\u003e - Ensures columns have uniform types. - If \"label\" is provided, separate it as a second channel in the tf.Dataset (as expected by Keras). - If \"weight\" is provided, separate it as a third channel in the tf.Dataset (as expected by Keras). - If \"task\" is provided, ensure the correct dtype of the label. If the task is a classification and the label is a string, integerize the labels. In this case, the label values are extracted from the dataset and ordered lexicographically. Warning: This logic won't work as expected if the training and testing dataset contain different label values. In such case, it is preferable to convert the label to integers beforehand while making sure the same encoding is used for all the datasets. - Returns \"tf.data.from_tensor_slices\" ||\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n| Args ---- ||\n|---------------------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|\n| `dataframe` | Pandas dataframe containing a training or evaluation dataset. |\n| `label` | Name of the label column. |\n| `task` | Target task of the dataset. |\n| `max_num_classes` | Maximum number of classes for a classification task. A high number of unique value / classes might indicate that the problem is a regression or a ranking instead of a classification. Set to None to disable checking the number of classes. |\n| `in_place` | If false (default), the input `dataframe` will not be modified by `pd_dataframe_to_tf_dataset`. However, a copy of the dataset memory will be made. If true, the dataframe will be modified in-place. |\n| `fix_feature_names` | Some feature names are not supported by the SavedModel signature. If `fix_feature_names=True` (default) the feature will be renamed and made compatible. If `fix_feature_names=False`, the feature name will not be changed, but exporting the model might fail (i.e. `model.save(...)`). |\n| `weight` | Optional name of a column in `dataframe` to use to weight the training. |\n| `batch_size` | Number of examples in each batch. The size of the batches has no impact on the TF-DF training algorithms. However, a small batch size can lead to a large overhead when loading the dataset. Defaults to 1000, but if `batch_size` is set to `None`, no batching is applied. Note: TF-DF expects for the dataset to be batched. |\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n| Returns ------- ||\n|---|---|\n| A TensorFlow Dataset. ||\n\n\u003cbr /\u003e"]]