将 Pytorch LSTM 的状态参数转换为 Keras LSTM [英] Converting state-parameters of Pytorch LSTM to Keras LSTM

查看:30
本文介绍了将 Pytorch LSTM 的状态参数转换为 Keras LSTM的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我试图将现有的经过训练的 PyTorch 模型移植到 Keras.

在移植过程中,我卡在了 LSTM 层.

LSTM 网络的 Keras 实现似乎有三种状态类型的状态矩阵,而 Pytorch 实现有四种.

例如,对于hidden_​​layers=64的双向LSTM,input_size=512 &输出大小=128个状态参数其中如下

Keras LSTM 的状态参数

[,<tf.Variable 'bidirectional_1/forward_lstm_1/recurrent_kernel:0' shape=(64, 256) dtype=float32_ref>,<tf.Variable 'bidirectional_1/forward_lstm_1/bias:0' shape=(256,) dtype=float32_ref>,<tf.Variable 'bidirectional_1/backward_lstm_1/kernel:0' shape=(512, 256) dtype=float32_ref>,<tf.Variable 'bidirectional_1/backward_lstm_1/recurrent_kernel:0' shape=(64, 256) dtype=float32_ref>,<tf.Variable 'bidirectional_1/backward_lstm_1/bias:0' shape=(256,) dtype=float32_ref>]

PyTorch LSTM 的状态参数

 ['rnn.0.rnn.weight_ih_l0', torch.Size([256, 512])],['rnn.0.rnn.weight_hh_l0', torch.Size([256, 64])],['rnn.0.rnn.bias_ih_l0', torch.Size([256])],['rnn.0.rnn.bias_hh_l0', torch.Size([256])],['rnn.0.rnn.weight_ih_l0_reverse', torch.Size([256, 512])],['rnn.0.rnn.weight_hh_l0_reverse', torch.Size([256, 64])],['rnn.0.rnn.bias_ih_l0_reverse', torch.Size([256])],['rnn.0.rnn.bias_hh_l0_reverse', torch.Size([256])],

我试图查看两个实现的代码,但不太明白.

谁能帮我把 PyTorch 中的 4 组状态参数转换成 Keras 中的 3 组状态参数

解决方案

它们真的没什么不同.如果您将 PyTorch 中的两个偏置向量相加,则方程式将与在 Keras 中实现的相同.

这是

PyTorch 使用两个独立的偏置向量进行输入转换(下标以 i 开头)和循环转换(下标以 h 开头).

在 Keras LSTMCell 中:

 x_i = K.dot(inputs_i, self.kernel_i)x_f = K.dot(inputs_f, self.kernel_f)x_c = K.dot(inputs_c, self.kernel_c)x_o = K.dot(inputs_o, self.kernel_o)如果 self.use_bias:x_i = K.bias_add(x_i, self.bias_i)x_f = K.bias_add(x_f, self.bias_f)x_c = K.bias_add(x_c, self.bias_c)x_o = K.bias_add(x_o, self.bias_o)如果 0 

在输入转换中只添加了一个偏差.然而,如果我们将 PyTorch 中的两个偏差相加,这些方程将是等价的.

双偏置 LSTM 是在 cuDNN 中实现的(参见 开发者指南).我真的不太熟悉 PyTorch,但我想这就是他们使用两个偏差参数的原因.在 Keras 中,CuDNNLSTM 层也有两个偏置权重向量.

I was trying to port an existing trained PyTorch model into Keras.

During the porting, I got stuck at LSTM layer.

Keras implementation of LSTM network seems to have three state kind of state matrices while Pytorch implementation have four.

For eg, for an Bidirectional LSTM with hidden_layers=64, input_size=512 & output size=128 state parameters where as follows

State params of Keras LSTM

[<tf.Variable 'bidirectional_1/forward_lstm_1/kernel:0' shape=(512, 256) dtype=float32_ref>,
 <tf.Variable 'bidirectional_1/forward_lstm_1/recurrent_kernel:0' shape=(64, 256) dtype=float32_ref>,
 <tf.Variable 'bidirectional_1/forward_lstm_1/bias:0' shape=(256,) dtype=float32_ref>,
 <tf.Variable 'bidirectional_1/backward_lstm_1/kernel:0' shape=(512, 256) dtype=float32_ref>,
 <tf.Variable 'bidirectional_1/backward_lstm_1/recurrent_kernel:0' shape=(64, 256) dtype=float32_ref>,
 <tf.Variable 'bidirectional_1/backward_lstm_1/bias:0' shape=(256,) dtype=float32_ref>]

State params of PyTorch LSTM

 ['rnn.0.rnn.weight_ih_l0', torch.Size([256, 512])],
 ['rnn.0.rnn.weight_hh_l0', torch.Size([256, 64])],
 ['rnn.0.rnn.bias_ih_l0', torch.Size([256])],
 ['rnn.0.rnn.bias_hh_l0', torch.Size([256])],
 ['rnn.0.rnn.weight_ih_l0_reverse', torch.Size([256, 512])],
 ['rnn.0.rnn.weight_hh_l0_reverse', torch.Size([256, 64])],
 ['rnn.0.rnn.bias_ih_l0_reverse', torch.Size([256])],
 ['rnn.0.rnn.bias_hh_l0_reverse', torch.Size([256])],

I tried to look in to the code of both implementation but not able to understand much.

Can someone please help me to transform 4-set of state params from PyTorch into 3-set of state params in Keras

解决方案

They are really not that different. If you sum up the two bias vectors in PyTorch, the equations will be the same as what's implemented in Keras.

This is the LSTM formula on PyTorch documentation:

PyTorch uses two separate bias vectors for the input transformation (with a subscript starts with i) and recurrent transformation (with a subscript starts with h).

In Keras LSTMCell:

        x_i = K.dot(inputs_i, self.kernel_i)
        x_f = K.dot(inputs_f, self.kernel_f)
        x_c = K.dot(inputs_c, self.kernel_c)
        x_o = K.dot(inputs_o, self.kernel_o)
        if self.use_bias:
            x_i = K.bias_add(x_i, self.bias_i)
            x_f = K.bias_add(x_f, self.bias_f)
            x_c = K.bias_add(x_c, self.bias_c)
            x_o = K.bias_add(x_o, self.bias_o)

        if 0 < self.recurrent_dropout < 1.:
            h_tm1_i = h_tm1 * rec_dp_mask[0]
            h_tm1_f = h_tm1 * rec_dp_mask[1]
            h_tm1_c = h_tm1 * rec_dp_mask[2]
            h_tm1_o = h_tm1 * rec_dp_mask[3]
        else:
            h_tm1_i = h_tm1
            h_tm1_f = h_tm1
            h_tm1_c = h_tm1
            h_tm1_o = h_tm1
        i = self.recurrent_activation(x_i + K.dot(h_tm1_i,
                                                  self.recurrent_kernel_i))
        f = self.recurrent_activation(x_f + K.dot(h_tm1_f,
                                                  self.recurrent_kernel_f))
        c = f * c_tm1 + i * self.activation(x_c + K.dot(h_tm1_c,
                                                        self.recurrent_kernel_c))
        o = self.recurrent_activation(x_o + K.dot(h_tm1_o,
                                                  self.recurrent_kernel_o))

There's only one bias added in the input transformation. However, the equations would be equivalent if we sum up the two biases in PyTorch.

The two-bias LSTM is what's implemented in cuDNN (see the developer guide). I'm really not that familiar with PyTorch, but I guess that's why they use two bias parameters. In Keras, the CuDNNLSTM layer also has two bias weight vectors.

这篇关于将 Pytorch LSTM 的状态参数转换为 Keras LSTM的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆