为什么在keras中,CuDNNLSTM的parameres比LSTM多? [英] Why CuDNNLSTM has more parameres than LSTM in keras?

查看:90
本文介绍了为什么在keras中,CuDNNLSTM的parameres比LSTM多?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我一直在尝试计算Keras的LSTM单元中的参数数量.我创建了两个模型,一个使用LSTM,另一个使用CuDNNLSTM.

I have been trying to compute number of parameters in LSTM cell in Keras. I created two models one with LSTM and other with CuDNNLSTM.

模型的部分摘要为

CuDNNLSTM模型:

CuDNNLSTM Model:

    _________________________________________________________________
    Layer (type)                 Output Shape              Param # 
    =================================================================
    embedding (Embedding)        (None, None, 300)         192000
    _________________________________________________________________
    bidirectional (Bidirectional (None, None, 600)         1444800

LSTM模型


    Layer (type)                 Output Shape              Param #
    =================================================================
    embedding_1 (Embedding)      (None, None, 300)         192000    
    _________________________________________________________________  
    bidirectional (Bidirectional (None, None, 600)         1442400

LSTM中的参数数量遵循整个Internet上可用的lstm参数计算的公式.但是,CuDNNLSTM有2400个额外的参数.

Number of parameters in LSTM is following the formula for lstm parameter computation available all over the internet. However, CuDNNLSTM has 2400 extra parameters.

这些额外参数的原因是什么?

What is the cause of these extra parameters?

代码

    import tensorflow.compat.v1 as tf
    tf.disable_v2_behavior()

    from tensorflow.compat.v1.keras.models import Sequential
    from tensorflow.compat.v1.keras.layers import CuDNNLSTM, Bidirectional, Embedding, LSTM

    model = Sequential()
    model.add(Embedding(640, 300))
    model.add(Bidirectional(<LSTM type>(300, return_sequences=True)))


推荐答案

LSTM参数可以分为3类:输入权重矩阵(W),递归权重矩阵(R),偏差(b).LSTM单元的部分计算是 W * x + b_i + R * h + b_r ,其中 b_i 是输入偏差,而 b_r 是递归偏差.

LSTM parameters can be grouped in 3 categories: input weight matrices (W), recurrent weight matrices (R), biases (b). Part of the LSTM cell's computation is W*x + b_i + R*h + b_r where b_i are input biases and b_r are recurrent biases.

如果让 b = b_i + b_r ,则可以将上面的表达式重写为 W * x + R * h + b .这样,您就无需保留两个单独的偏差矢量( b_i b_r ),而只需要存储一个矢量( b).

If you let b = b_i + b_r, you could rewrite the above expression as W*x + R*h + b. In doing so, you've eliminated the need to keep two separate bias vectors (b_i and b_r) and instead, you only need to store one vector (b).

cuDNN坚持原始的数学公式,并分别存储 b_i b_r .凯拉斯没有;它只存储 b .这就是cuDNN的LSTM比Keras具有更多参数的原因.

cuDNN sticks with the original mathematical formulation and stores b_i and b_r separately. Keras does not; it only stores b. That's why cuDNN's LSTM has more parameters than Keras.

这篇关于为什么在keras中,CuDNNLSTM的parameres比LSTM多?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆