了解Keras LSTM重量 [英] Understand Keras LSTM weights

查看:135
本文介绍了了解Keras LSTM重量的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我可以理解如何将密集层权重相乘以获得预测的输出,但是如何从LSTM模型中解释矩阵呢?
这是一些玩具示例(不要介意拟合,这只是关于矩阵乘法).

密集示例:

from keras.models import Model 
from keras.layers import Input, Dense, LSTM
import numpy as np
np.random.seed(42)

X = np.array([[1, 2], [3, 4]])

I = Input(X.shape[1:])
D = Dense(2)(I)
linear_model = Model(inputs=[I], outputs=[D])
print('linear_model.predict:\n', linear_model.predict(X))

weight, bias = linear_model.layers[1].get_weights()
print('bias + X @ weights:\n', bias + X @ weight)

输出:

linear_model.predict:
 [[ 3.10299015  0.46077788]
 [ 7.12412453  1.17058146]]
bias + X @ weights:
 [[ 3.10299003  0.46077788]
 [ 7.12412441  1.17058146]]

LSTM示例:

X = X.reshape(*X.shape, 1)
I = Input(X.shape[1:])
L = LSTM(2)(I)
lstm_model = Model(inputs=[I], outputs=[L])
print('lstm_model.predict:\n', lstm_model.predict(X))
print('weights I don\'t understand:\n')
lstm_model.layers[1].get_weights()

输出:

lstm_model.predict:
 [[ 0.27675897  0.15364291]
 [ 0.49197391  0.04097994]]

weights I don't understand:
[array([[ 0.11056691,  0.03153521, -0.78214532,  0.04079598,  0.32587671,
          0.72789955,  0.58123612, -0.57094401]], dtype=float32),
 array([[-0.16277026, -0.43958429,  0.30112407,  0.07443386,  0.70584315,
          0.17196879, -0.14703408,  0.36694485],
        [-0.03672785, -0.55035251,  0.27230391, -0.45381972, -0.06399836,
         -0.00104597,  0.14719161, -0.62441903]], dtype=float32),
 array([ 0.,  0.,  1.,  1.,  0.,  0.,  0.,  0.], dtype=float32)]

解决方案

您可以从张量对象中获取权重的名称

weight_tensors = lstm_model.layers[1].weights
weight_names = list(map(lambda x: x.name, weight_tensors))
print(weight_names)

输出:

['lstm_1/kernel:0', 'lstm_1/recurrent_kernel:0', 'lstm_1/bias:0']

来自源代码您会看到这些权重被分为输入,忘记,单元格状态和输出的权重

    self.kernel_i = self.kernel[:, :self.units]
    self.kernel_f = self.kernel[:, self.units: self.units * 2]
    self.kernel_c = self.kernel[:, self.units * 2: self.units * 3]
    self.kernel_o = self.kernel[:, self.units * 3:]

    self.recurrent_kernel_i = self.recurrent_kernel[:, :self.units]
    self.recurrent_kernel_f = self.recurrent_kernel[:, self.units: self.units * 2]
    self.recurrent_kernel_c = self.recurrent_kernel[:, self.units * 2: self.units * 3]
    self.recurrent_kernel_o = self.recurrent_kernel[:, self.units * 3:]

    if self.use_bias:
        self.bias_i = self.bias[:self.units]
        self.bias_f = self.bias[self.units: self.units * 2]
        self.bias_c = self.bias[self.units * 2: self.units * 3]
        self.bias_o = self.bias[self.units * 3:]
    else:
        self.bias_i = None
        self.bias_f = None
        self.bias_c = None
        self.bias_o = None

这些权重的使用取决于实现.我总是参考克里斯托弗·奥拉(Christopher Olah)的博客. >

I can understand how to multiply Dense layer weights in order to get predicted output, but how can I interpret matrices from LSTM model?
Here are some toy examples (don't mind fitting, it's just about matrix multiplication)

Dense example:

from keras.models import Model 
from keras.layers import Input, Dense, LSTM
import numpy as np
np.random.seed(42)

X = np.array([[1, 2], [3, 4]])

I = Input(X.shape[1:])
D = Dense(2)(I)
linear_model = Model(inputs=[I], outputs=[D])
print('linear_model.predict:\n', linear_model.predict(X))

weight, bias = linear_model.layers[1].get_weights()
print('bias + X @ weights:\n', bias + X @ weight)

Output:

linear_model.predict:
 [[ 3.10299015  0.46077788]
 [ 7.12412453  1.17058146]]
bias + X @ weights:
 [[ 3.10299003  0.46077788]
 [ 7.12412441  1.17058146]]

LSTM example:

X = X.reshape(*X.shape, 1)
I = Input(X.shape[1:])
L = LSTM(2)(I)
lstm_model = Model(inputs=[I], outputs=[L])
print('lstm_model.predict:\n', lstm_model.predict(X))
print('weights I don\'t understand:\n')
lstm_model.layers[1].get_weights()

Output:

lstm_model.predict:
 [[ 0.27675897  0.15364291]
 [ 0.49197391  0.04097994]]

weights I don't understand:
[array([[ 0.11056691,  0.03153521, -0.78214532,  0.04079598,  0.32587671,
          0.72789955,  0.58123612, -0.57094401]], dtype=float32),
 array([[-0.16277026, -0.43958429,  0.30112407,  0.07443386,  0.70584315,
          0.17196879, -0.14703408,  0.36694485],
        [-0.03672785, -0.55035251,  0.27230391, -0.45381972, -0.06399836,
         -0.00104597,  0.14719161, -0.62441903]], dtype=float32),
 array([ 0.,  0.,  1.,  1.,  0.,  0.,  0.,  0.], dtype=float32)]

解决方案

You can get the name of the weights from the tensor object

weight_tensors = lstm_model.layers[1].weights
weight_names = list(map(lambda x: x.name, weight_tensors))
print(weight_names)

Output:

['lstm_1/kernel:0', 'lstm_1/recurrent_kernel:0', 'lstm_1/bias:0']

From the source code you can see that those weights are split into weights for the input, forget, cell state, and output

    self.kernel_i = self.kernel[:, :self.units]
    self.kernel_f = self.kernel[:, self.units: self.units * 2]
    self.kernel_c = self.kernel[:, self.units * 2: self.units * 3]
    self.kernel_o = self.kernel[:, self.units * 3:]

    self.recurrent_kernel_i = self.recurrent_kernel[:, :self.units]
    self.recurrent_kernel_f = self.recurrent_kernel[:, self.units: self.units * 2]
    self.recurrent_kernel_c = self.recurrent_kernel[:, self.units * 2: self.units * 3]
    self.recurrent_kernel_o = self.recurrent_kernel[:, self.units * 3:]

    if self.use_bias:
        self.bias_i = self.bias[:self.units]
        self.bias_f = self.bias[self.units: self.units * 2]
        self.bias_c = self.bias[self.units * 2: self.units * 3]
        self.bias_o = self.bias[self.units * 3:]
    else:
        self.bias_i = None
        self.bias_f = None
        self.bias_c = None
        self.bias_o = None

The usage of those weights depends on the implementation. I always refer to Christopher Olah's blog for the formulation.

这篇关于了解Keras LSTM重量的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆