Common Signatures for Text
Stay organized with collections
Save and categorize content based on your preferences.
This page describes common signatures that should be implemented by modules in
the TF1 Hub format for tasks that accept text inputs.
(For the TF2 SavedModel format, see the analogous
SavedModel API.)
Text feature vector
A text feature vector module creates a dense vector representation
from text features.
It accepts a batch of strings of shape [batch_size]
and maps them to
a float32
tensor of shape [batch_size, N]
. This is often called
text embedding in dimension N
.
Basic usage
embed = hub.Module("path/to/module")
representations = embed([
"A long sentence.",
"single-word",
"http://example.com"])
Feature column usage
feature_columns = [
hub.text_embedding_column("comment", "path/to/module", trainable=False),
]
input_fn = tf.estimator.inputs.numpy_input_fn(features, labels, shuffle=True)
estimator = tf.estimator.DNNClassifier(hidden_units, feature_columns)
estimator.train(input_fn, max_steps=100)
Notes
Modules have been pre-trained on different domains and/or tasks,
and therefore not every text feature vector module would be suitable for
your problem. E.g.: some modules could have been trained on a single language.
This interface does not allow fine-tuning of the text representation on TPUs,
because it requires the module to instantiate both string processing and the
trainable variables at the same time.
Except as otherwise noted, the content of this page is licensed under the Creative Commons Attribution 4.0 License, and code samples are licensed under the Apache 2.0 License. For details, see the Google Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.
Last updated 2023-10-07 UTC.
[[["Easy to understand","easyToUnderstand","thumb-up"],["Solved my problem","solvedMyProblem","thumb-up"],["Other","otherUp","thumb-up"]],[["Missing the information I need","missingTheInformationINeed","thumb-down"],["Too complicated / too many steps","tooComplicatedTooManySteps","thumb-down"],["Out of date","outOfDate","thumb-down"],["Samples / code issue","samplesCodeIssue","thumb-down"],["Other","otherDown","thumb-down"]],["Last updated 2023-10-07 UTC."],[],[],null,["# Common Signatures for Text\n\n\u003cbr /\u003e\n\nThis page describes common signatures that should be implemented by modules in\nthe [TF1 Hub format](../tf1_hub_module) for tasks that accept text inputs.\n(For the [TF2 SavedModel format](../tf2_saved_model), see the analogous\n[SavedModel API](../common_saved_model_apis/text).)\n\nText feature vector\n-------------------\n\nA **text feature vector** module creates a dense vector representation\nfrom text features.\nIt accepts a batch of strings of shape `[batch_size]` and maps them to\na `float32` tensor of shape `[batch_size, N]`. This is often called\n**text embedding** in dimension `N`.\n\n### Basic usage\n\n embed = hub.Module(\"path/to/module\")\n representations = embed([\n \"A long sentence.\",\n \"single-word\",\n \"http://example.com\"])\n\n### Feature column usage\n\n feature_columns = [\n hub.text_embedding_column(\"comment\", \"path/to/module\", trainable=False),\n ]\n input_fn = tf.estimator.inputs.numpy_input_fn(features, labels, shuffle=True)\n estimator = tf.estimator.DNNClassifier(hidden_units, feature_columns)\n estimator.train(input_fn, max_steps=100)\n\nNotes\n-----\n\nModules have been pre-trained on different domains and/or tasks,\nand therefore not every text feature vector module would be suitable for\nyour problem. E.g.: some modules could have been trained on a single language.\n\nThis interface does not allow fine-tuning of the text representation on TPUs,\nbecause it requires the module to instantiate both string processing and the\ntrainable variables at the same time."]]