View source on GitHub |
Text utilities.
tfds
includes a set of TextEncoder
s as well as a Tokenizer
to enable
expressive, performant, and reproducible natural language research.
Classes
class ByteTextEncoder
: Byte-encodes text.
class SubwordTextEncoder
: Invertible TextEncoder
using word pieces with a byte-level fallback.
class TextEncoder
: Abstract base class for converting between text and integers.
class TextEncoderConfig
: Configuration for tfds.features.Text
.
class TokenTextEncoder
: TextEncoder backed by a list of tokens.
class Tokenizer
: Splits a string into tokens, and joins them back.