密集层的LSTM初始状态 [英] LSTM Initial state from Dense layer

查看:469
本文介绍了密集层的LSTM初始状态的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在时间序列数据上使用lstm.我具有有关时间序列的功能,这些功能与时间无关.想象一下该系列的公司库存以及诸如非时间系列功能中的公司位置之类的东西.这不是用例,但是是相同的想法.对于此示例,我们仅预测时间序列中的下一个值.

I am using a lstm on time series data. I have features about the time series that are not time dependent. Imagine company stocks for the series and stuff like company location in the non-time series features. This is not the usecase, but it is the same idea. For this example, let's just predict the next value in the time series.

一个简单的例子是:

feature_input = Input(shape=(None, data.training_features.shape[1]))
dense_1 = Dense(4, activation='relu')(feature_input)
dense_2 = Dense(8, activation='relu')(dense_1)

series_input = Input(shape=(None, data.training_series.shape[1]))
lstm = LSTM(8)(series_input, initial_state=dense_2)
out = Dense(1, activation="sigmoid")(lstm)

model = Model(inputs=[feature_input,series_input], outputs=out)
model.compile(loss='mean_squared_error', optimizer='adam', metrics=["mape"])

但是,我不确定如何正确指定列表上的初始状态.我得到

however, I am just not sure on how to specify the initial state on the list correctly. I get

ValueError: An initial_state was passed that is not compatible with `cell.state_size`. Received `state_spec`=[<keras.engine.topology.InputSpec object at 0x11691d518>]; However `cell.state_size` is (8, 8)

我看到的

是3d批次尺寸引起的.我尝试使用Flatten,Permutation和Resize图层,但我认为这是不正确的.我缺少什么,如何连接这些层?

which I can see is caused by the 3d batch dimension. I tried using Flatten, Permutation, and Resize layers but I don't believe that is correct. What am I missing and how can I connect these layers?

推荐答案

第一个问题是LSTM(8)层需要两个初始状态h_0c_0,每个初始状态的尺寸为(None, 8).这就是错误消息中"cell.state_size为(8,8)"的意思.

The first problem is that an LSTM(8) layer expects two initial states h_0 and c_0, each of dimension (None, 8). That's what it means by "cell.state_size is (8, 8)" in the error message.

如果只有一个初始状态dense_2,则可以切换到GRU(仅需要h_0).或者,您可以将feature_input转换为两个初始状态.

If you only have one initial state dense_2, maybe you can switch to GRU (which requires only h_0). Or, you can transform your feature_input into two initial states.

第二个问题是h_0c_0的形状为(batch_size, 8),但是您的dense_2的形状为(batch_size, timesteps, 8).在使用dense_2作为初始状态之前,您需要处理时间维度.

The second problem is that h_0 and c_0 are of shape (batch_size, 8), but your dense_2 is of shape (batch_size, timesteps, 8). You need to deal with the time dimension before using dense_2 as initial states.

因此,也许您可​​以将输入形状更改为(data.training_features.shape[1],)或使用GlobalAveragePooling1D在时间步长上求平均值.

So maybe you can change your input shape into (data.training_features.shape[1],) or take average over timesteps with GlobalAveragePooling1D.

一个可行的例子是:

feature_input = Input(shape=(5,))
dense_1_h = Dense(4, activation='relu')(feature_input)
dense_2_h = Dense(8, activation='relu')(dense_1_h)
dense_1_c = Dense(4, activation='relu')(feature_input)
dense_2_c = Dense(8, activation='relu')(dense_1_c)

series_input = Input(shape=(None, 5))
lstm = LSTM(8)(series_input, initial_state=[dense_2_h, dense_2_c])
out = Dense(1, activation="sigmoid")(lstm)
model = Model(inputs=[feature_input,series_input], outputs=out)
model.compile(loss='mean_squared_error', optimizer='adam', metrics=["mape"])

这篇关于密集层的LSTM初始状态的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆