text.pad_model_inputs
Stay organized with collections
Save and categorize content based on your preferences.
Pad model input and generate corresponding input masks.
text.pad_model_inputs(
input, max_seq_length, pad_value=0
)
Used in the notebooks
pad_model_inputs
performs the final packaging of a model's inputs commonly
found in text models. This includes padding out (or simply truncating) to a
fixed-size, max 2-dimensional Tensor
and generating mask Tensor
s (of the
same shape) with values of 0 if the corresponding item is a pad value and 1 if
it is part of the original input.
Note that a simple truncation strategy (drop everything after max sequence
length) is used to force the inputs to the specified shape. This may be
incorrect and users should instead apply a Trimmer
upstream to safely
truncate large inputs.
input_data = tf.ragged.constant([
[101, 1, 2, 102, 10, 20, 102],
[101, 3, 4, 102, 30, 40, 50, 60, 70, 80],
[101, 5, 6, 7, 8, 9, 102, 70],
], np.int32)
data, mask = pad_model_inputs(input=input_data, max_seq_length=9)
print("data: %s, mask: %s" % (data, mask))
data: tf.Tensor(
[[101 1 2 102 10 20 102 0 0]
[101 3 4 102 30 40 50 60 70]
[101 5 6 7 8 9 102 70 0]], shape=(3, 9), dtype=int32),
mask: tf.Tensor(
[[1 1 1 1 1 1 1 0 0]
[1 1 1 1 1 1 1 1 1]
[1 1 1 1 1 1 1 1 0]], shape=(3, 9), dtype=int32)
Args |
input
|
A RaggedTensor or Tensor with rank >= 1.
|
max_seq_length
|
An int, or scalar Tensor . The "input" Tensor will be
flattened down to 2 dimensions (if needed), and then have its inner
dimension either padded out or truncated to this size.
|
pad_value
|
An int or scalar Tensor specifying the value used for padding.
|
Returns |
A tuple of (padded_input, pad_mask) where:
|
padded_input
|
A Tensor corresponding to inputs that has been
padded/truncated out to a fixed size and flattened to max 2
dimensions.
|
pad_mask
|
A Tensor corresponding to padded_input whose values are
0 if the corresponding item is a pad value and 1 if it is not.
|
Except as otherwise noted, the content of this page is licensed under the Creative Commons Attribution 4.0 License, and code samples are licensed under the Apache 2.0 License. For details, see the Google Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.
Last updated 2025-04-11 UTC.
[[["Easy to understand","easyToUnderstand","thumb-up"],["Solved my problem","solvedMyProblem","thumb-up"],["Other","otherUp","thumb-up"]],[["Missing the information I need","missingTheInformationINeed","thumb-down"],["Too complicated / too many steps","tooComplicatedTooManySteps","thumb-down"],["Out of date","outOfDate","thumb-down"],["Samples / code issue","samplesCodeIssue","thumb-down"],["Other","otherDown","thumb-down"]],["Last updated 2025-04-11 UTC."],[],[],null,["# text.pad_model_inputs\n\n\u003cbr /\u003e\n\n|--------------------------------------------------------------------------------------------------------------------------------------|\n| [View source on GitHub](https://github.com/tensorflow/text/blob/v2.19.0/tensorflow_text/python/ops/pad_model_inputs_ops.py#L26-L104) |\n\nPad model input and generate corresponding input masks. \n\n text.pad_model_inputs(\n input, max_seq_length, pad_value=0\n )\n\n### Used in the notebooks\n\n| Used in the guide |\n|-----------------------------------------------------------------------------------------------------|\n| - [BERT Preprocessing with TF Text](https://www.tensorflow.org/text/guide/bert_preprocessing_guide) |\n\n`pad_model_inputs` performs the final packaging of a model's inputs commonly\nfound in text models. This includes padding out (or simply truncating) to a\nfixed-size, max 2-dimensional `Tensor` and generating mask `Tensor`s (of the\nsame shape) with values of 0 if the corresponding item is a pad value and 1 if\nit is part of the original input.\n\nNote that a simple truncation strategy (drop everything after max sequence\nlength) is used to force the inputs to the specified shape. This may be\nincorrect and users should instead apply a `Trimmer` upstream to safely\ntruncate large inputs. \n\n input_data = tf.ragged.constant([\n [101, 1, 2, 102, 10, 20, 102],\n [101, 3, 4, 102, 30, 40, 50, 60, 70, 80],\n [101, 5, 6, 7, 8, 9, 102, 70],\n ], np.int32)\n data, mask = pad_model_inputs(input=input_data, max_seq_length=9)\n print(\"data: %s, mask: %s\" % (data, mask))\n data: tf.Tensor(\n [[101 1 2 102 10 20 102 0 0]\n [101 3 4 102 30 40 50 60 70]\n [101 5 6 7 8 9 102 70 0]], shape=(3, 9), dtype=int32),\n mask: tf.Tensor(\n [[1 1 1 1 1 1 1 0 0]\n [1 1 1 1 1 1 1 1 1]\n [1 1 1 1 1 1 1 1 0]], shape=(3, 9), dtype=int32)\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n| Args ---- ||\n|------------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|\n| `input` | A `RaggedTensor` or `Tensor` with rank \\\u003e= 1. |\n| `max_seq_length` | An int, or scalar `Tensor`. The \"input\" `Tensor` will be flattened down to 2 dimensions (if needed), and then have its inner dimension either padded out or truncated to this size. |\n| `pad_value` | An int or scalar `Tensor` specifying the value used for padding. |\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n| Returns ------- ||\n|----------------|----------------------------------------------------------------------------------------------------------------------------|\n| A tuple of (padded_input, pad_mask) where: ||\n| `padded_input` | A `Tensor` corresponding to `inputs` that has been padded/truncated out to a fixed size and flattened to max 2 dimensions. |\n| `pad_mask` | A `Tensor` corresponding to `padded_input` whose values are 0 if the corresponding item is a pad value and 1 if it is not. |\n\n\u003cbr /\u003e"]]