tfm.nlp.models.T5TransformerParams
Stay organized with collections
Save and categorize content based on your preferences.
Transformer parameters.
tfm.nlp.models.T5TransformerParams(
num_layers: int,
d_model: int,
d_kv: int,
num_heads: int,
d_ff: int,
vocab_size: int,
target_vocab_size: Optional[int] = None,
dropout_rate: float = 0.0,
layer_norm_epsilon: float = 1e-06,
shared_embedding: bool = False,
vocab_embeddings_initializer: Optional[Initializer] = None,
relative_attention_num_buckets: int = 32,
relative_attention_max_distance: int = 128,
relative_embeddings_initializer: Optional[Initializer] = None,
weight_initializer: Optional[Initializer] = tfm.nlp.models.T5TransformerParams.weight_initializer
,
bias_initializer: Optional[Initializer] = None,
rescale_query: bool = False,
bidirectional: bool = True,
ffn_activations: Sequence[str] = tfm.nlp.models.T5TransformerParams.ffn_activations
,
logits_via_embedding: bool = True,
num_decoder_layers: Optional[int] = None,
one_hot_embedding: bool = True,
layer_sharing: bool = False,
use_shared_relative_position_bias: bool = True,
return_attention_scores: bool = False
)
Attributes |
num_layers
|
Dataclass field
|
d_model
|
Dataclass field
|
d_kv
|
Dataclass field
|
num_heads
|
Dataclass field
|
d_ff
|
Dataclass field
|
vocab_size
|
Dataclass field
|
target_vocab_size
|
Dataclass field
|
dropout_rate
|
Dataclass field
|
layer_norm_epsilon
|
Dataclass field
|
shared_embedding
|
Dataclass field
|
vocab_embeddings_initializer
|
Dataclass field
|
relative_attention_num_buckets
|
Dataclass field
|
relative_attention_max_distance
|
Dataclass field
|
relative_embeddings_initializer
|
Dataclass field
|
weight_initializer
|
Dataclass field
|
bias_initializer
|
Dataclass field
|
rescale_query
|
Dataclass field
|
bidirectional
|
Dataclass field
|
ffn_activations
|
Dataclass field
|
logits_via_embedding
|
Dataclass field
|
num_decoder_layers
|
Dataclass field
|
one_hot_embedding
|
Dataclass field
|
layer_sharing
|
Dataclass field
|
use_shared_relative_position_bias
|
Dataclass field
|
return_attention_scores
|
Dataclass field
|
Methods
weight_initializer
weight_initializer(
dtype=None, **kwargs
)
He normal initializer.
Also available via the shortcut function
tf.keras.initializers.he_normal
.
It draws samples from a truncated normal distribution centered on 0 with
stddev = sqrt(2 / fan_in)
where fan_in
is the number of input units in
the weight tensor.
Examples:
# Standalone usage:
initializer = tf.keras.initializers.HeNormal()
values = initializer(shape=(2, 2))
# Usage in a Keras layer:
initializer = tf.keras.initializers.HeNormal()
layer = tf.keras.layers.Dense(3, kernel_initializer=initializer)
Args |
seed
|
A Python integer. Used to make the behavior of the initializer
deterministic. Note that a seeded initializer will not produce the same
random values across multiple calls, but multiple initializers will
produce the same sequence when constructed with the same seed value.
|
__eq__
__eq__(
other
)
Class Variables |
bias_initializer
|
None
|
bidirectional
|
True
|
dropout_rate
|
0.0
|
ffn_activations
|
('relu',)
|
layer_norm_epsilon
|
1e-06
|
layer_sharing
|
False
|
logits_via_embedding
|
True
|
num_decoder_layers
|
None
|
one_hot_embedding
|
True
|
relative_attention_max_distance
|
128
|
relative_attention_num_buckets
|
32
|
relative_embeddings_initializer
|
None
|
rescale_query
|
False
|
return_attention_scores
|
False
|
shared_embedding
|
False
|
target_vocab_size
|
None
|
use_shared_relative_position_bias
|
True
|
vocab_embeddings_initializer
|
None
|
Except as otherwise noted, the content of this page is licensed under the Creative Commons Attribution 4.0 License, and code samples are licensed under the Apache 2.0 License. For details, see the Google Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates. Some content is licensed under the numpy license.
Last updated 2024-02-02 UTC.
[[["Easy to understand","easyToUnderstand","thumb-up"],["Solved my problem","solvedMyProblem","thumb-up"],["Other","otherUp","thumb-up"]],[["Missing the information I need","missingTheInformationINeed","thumb-down"],["Too complicated / too many steps","tooComplicatedTooManySteps","thumb-down"],["Out of date","outOfDate","thumb-down"],["Samples / code issue","samplesCodeIssue","thumb-down"],["Other","otherDown","thumb-down"]],["Last updated 2024-02-02 UTC."],[],[]]