Количество параметров подсчета в ГРУ
У меня модель ГРУ выглядит следующим образом.
class CharGenModel(tf.keras.Model):
def __init__(self, vocab_size, num_timesteps, embedding_dim, **kwargs):
super(CharGenModel, self).__init__(**kwargs)
self.embedding_layer = tf.keras.layers.Embedding(vocab_size, embedding_dim)
self.rnn_layer = tf.keras.layers.GRU(
num_timesteps,
recurrent_initializer="glorot_uniform",
recurrent_activation="sigmoid",
stateful=True,
return_sequences=True
)
self.dense_layer = tf.keras.layers.Dense(vocab_size)
def call(self, x):
print(x.shape)
x = self.embedding_layer(x)
print(x.shape)
x = self.rnn_layer(x)
print(x.shape)
x = self.dense_layer(x)
print(x.shape)
return x
vocab_size = 92
embedding_dim = 256
seq_length = 100
batch_size = 64
model = CharGenModel(vocab_size, seq_length, embedding_dim)
model.build(input_shape=(batch_size, seq_length))
model.summary()
model.summary() произвел количество обучаемых параметров следующим образом.
(64, 100)
(64, 100, 256)
(64, 100, 100)
(64, 100, 92)
Model: "char_gen_model_4"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
embedding_4 (Embedding) multiple 23552
gru_4 (GRU) multiple 107400
dense_4 (Dense) multiple 9292
=================================================================
Total params: 140,244
Trainable params: 140,244
Non-trainable params: 0
Я смущен двумя вещами.
В соответствии с определением слоя внедрения
tf.keras.layers.Embedding(
input_dim,
output_dim,
embeddings_initializer='uniform',
embeddings_regularizer=None,
activity_regularizer=None,
embeddings_constraint=None,
mask_zero=False,
input_length=None,
**kwargs
)
input_dim for embedding layer for my application is 64x100.
(1)But why embedding layer trainable parameters are 92x256=23552. Why not 100x256?
(2)Number of parameters counting for GRU is
num_params = number of FFNNs × [number of hidden units x (number of hidden units+number of inputs) + number of bias]
number of FFNNs(Number of feedforward networks) in GRU is 3
number of hidden units is 100
number of inputs is 256
number of bias is 100
so num_params = 3 x [100x(100+256)+100] = 107100
But model summary output is 107400
Где я ошибся в расчетах?