为什么Keras模型中的第一个LSTM比后续的LSTM具有更多的参数? [英] Why does the first LSTM in a Keras model have more params than the subsequent one?

查看:288
本文介绍了为什么Keras模型中的第一个LSTM比后续的LSTM具有更多的参数?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我只是从一个相当简单的顺序模型中查看Keras模型的详细信息,在该模型中我有多个LSTM层,一层又一层.令我惊讶的是,尽管与后续LSTM层具有相同的定义,但第一层始终具有更多的参数.

I was just looking at the Keras model details from a fairly straightforward sequential model where I have multiple LSTM layers, one after another. I was surprised to see that the first layer always has more params despite having the same definition as the subsequent LSTM layer.

此处的模型定义清楚地表明了这一点:

The model definition here shows it clearly:

Layer (type)                 Output Shape              Param #   
=================================================================
lstm_1 (LSTM)                (None, 400, 5)            380       
_________________________________________________________________
lstm_2 (LSTM)                (None, 400, 5)            220       
_________________________________________________________________
time_distributed_1 (TimeDist (None, 400, 650)          3900      
_________________________________________________________________
lstm_3 (LSTM)                (None, 400, 20)           53680     
_________________________________________________________________
lstm_4 (LSTM)                (None, 400, 20)           3280      
_________________________________________________________________

在经过时间分布的致密层之后,接下来的两个相同的LSTM同样如此.

Similarly after a time distributed dense layer, the same is true of the next two identical LSTMs.

我对LSTM的理解是不正确的,相同的定义不会导致产生与最后标记的重复"相同的层,还是我需要理解的参数计数中是否还有其他内容?目前,这对我来说看起来很奇怪!

Is my understanding of LSTMs incorrect, that an identical definition doesn't result in the same layer being produced as a 'duplicate' tagged on the end, or is there something else in the param count that I need to understand? Currently it just looks weird to me!

任何解释都可以帮助我(a)更好地理解,并且(b)基于此新知识来构建性能更高的模型.

Any explanation would be great to help me (a) understand better, and (b) build more performant models based on this new knowledge.

推荐答案

LSTM的输出仅取决于其units.

The output of the LSTM depends only on its units.

我们看到您的第一层同时拥有5个单元.
另外两个有20个单位.

We see that your first layers have both 5 units.
And the other two have 20 units.

但是可训练的参数(用于对输入进行计算并带来预期输出的数字),这些参数需要考虑即将出现的输入特征,因为它们将不得不考虑所有输入在他们的计算中.

But the trainable parameters (numbers to perform calculations on the inputs and bring the expected output), those need to take into account how many input features are coming, since they will have to consider all inputs in their calculations.

输入越大,则需要的参数就越多.我们可以告诉您输入中有5个以上的功能.对于最后两层,第一层的输入为650,而另一层的输入为20.

The bigger the input, the more parameters are necessary. We can tell that you have more than 5 features in the input. And for the two last layers, the input of the first one is 650, against 20 of the other.

在LSTM层中,您可以在

In the LSTM layer, as you can see in their code, there are 3 groups of weights:

  • 内核-形状为(units, 4*inputs)
  • 循环内核-形状为(units,4*units)
  • 偏差-形状为(4*units,)
  • kernel - shaped as (units, 4*inputs)
  • recurrent kernel - shaped as (units,4*units)
  • bias - shaped as (4*units,)

经过一些计算,我们可以推断出您输入的形状为(None, 400, 13)

With some calculations, we can infer that your inputs have shape (None, 400, 13)

Layer (type)         Output Shape        Param #   
========================================================================
input_6 (InputLayer) (None, 400, 13)     0         
________________________________________________________________________
lstm_1 (LSTM)        (None, 400, 5)      380   = 4*(13*5 + 5*5 + 5)   
________________________________________________________________________
lstm_2 (LSTM)        (None, 400, 5)      220   = 4*(5*5 + 5*5 + 5)     
________________________________________________________________________
time_distributed_1   (None, 400, 650)    3900  = ?  
________________________________________________________________________
lstm_3 (LSTM)        (None, 400, 20)     53680 = 4*(650*20 + 20*20 + 20)   
________________________________________________________________________
lstm_4 (LSTM)        (None, 400, 20)     3280  = 4*(20*20 + 20*20 + 20)    
________________________________________________________________________

  • LSTM 1参数= 4 *(13 * 5 + 5 * 5 + 5)
  • LSTM 2参数= 4 *(5 * 5 + 5 * 5 + 5)
  • 时间分配= ??
  • LSTM 3参数= 4 *(650 * 20 + 20 * 20 + 20)
  • LSTM 4参数= 4 *(20 * 20 + 20 * 20 + 20)
    • LSTM 1 parameters = 4*(13*5 + 5*5 + 5)
    • LSTM 2 parameters = 4*(5*5 + 5*5 + 5)
    • Time distributed = ??
    • LSTM 3 parameters = 4*(650*20 + 20*20 + 20)
    • LSTM 4 parameters = 4*(20*20 + 20*20 + 20)
    • 如果使用密集层进行测试,您还将看到:

      If you test with a dense layer, you will also see that:

      Layer (type)         Output Shape    Param #   
      =========================================================
      input_6 (InputLayer) (None, 13)      0      
      _________________________________________________________
      dense_1 (Dense)      (None, 5)       70    = 13*5 + 5      
      _________________________________________________________
      dense_2 (Dense)      (None, 5)       30    = 5*5 + 5   
      _________________________________________________________
      dense_3 (Dense)      (None, 650)     3900  = 5*650 + 650     
      _________________________________________________________
      dense_4 (Dense)      (None, 20)      13020 = 650*20 + 20   
      _________________________________________________________
      dense_5 (Dense)      (None, 20)      420   = 20*20 + 20    
      =========================================================
      

      区别在于,密集层没有递归内核,并且它们的内核没有乘以4.

      The difference is that the dense layers don't have a recurrent kernel, and their kernels are not multiplied by 4.

      • 密集1个参数= 13 * 5 + 5
      • 密集2个参数= 5 * 5 + 5
      • 密集3个参数= 5 * 650 + 650
      • 密集4个参数= 650 * 20 + 20
      • 密集5个参数= 20 * 20 + 20

      这篇关于为什么Keras模型中的第一个LSTM比后续的LSTM具有更多的参数?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆