Module: tfds.deprecated.text
Stay organized with collections
Save and categorize content based on your preferences.
Text utilities.
tfds
includes a set of TextEncoder
s as well as a Tokenizer
to enable
expressive, performant, and reproducible natural language research.
Classes
class ByteTextEncoder
: Byte-encodes text.
class SubwordTextEncoder
: Invertible TextEncoder
using word pieces with a byte-level fallback.
class TextEncoder
: Abstract base class for converting between text and integers.
class TextEncoderConfig
: Configuration for tfds.features.Text
.
class TokenTextEncoder
: TextEncoder backed by a list of tokens.
class Tokenizer
: Splits a string into tokens, and joins them back.
Except as otherwise noted, the content of this page is licensed under the Creative Commons Attribution 4.0 License, and code samples are licensed under the Apache 2.0 License. For details, see the Google Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.
Last updated 2024-04-26 UTC.
[[["Easy to understand","easyToUnderstand","thumb-up"],["Solved my problem","solvedMyProblem","thumb-up"],["Other","otherUp","thumb-up"]],[["Missing the information I need","missingTheInformationINeed","thumb-down"],["Too complicated / too many steps","tooComplicatedTooManySteps","thumb-down"],["Out of date","outOfDate","thumb-down"],["Samples / code issue","samplesCodeIssue","thumb-down"],["Other","otherDown","thumb-down"]],["Last updated 2024-04-26 UTC."],[],[]]