来自密集层的 LSTM 初始状态 [英] LSTM Initial state from Dense layer
问题描述
我在时间序列数据上使用 lstm
.我有关于时间序列的不依赖于时间的特征.想象一下该系列的公司股票以及非时间序列特征中的公司位置等内容.这不是用例,但它是相同的想法.对于这个例子,让我们只预测时间序列中的下一个值.
I am using a lstm
on time series data. I have features about the time series that are not time dependent. Imagine company stocks for the series and stuff like company location in the non-time series features. This is not the usecase, but it is the same idea. For this example, let's just predict the next value in the time series.
一个简单的例子是:
feature_input = Input(shape=(None, data.training_features.shape[1]))
dense_1 = Dense(4, activation='relu')(feature_input)
dense_2 = Dense(8, activation='relu')(dense_1)
series_input = Input(shape=(None, data.training_series.shape[1]))
lstm = LSTM(8)(series_input, initial_state=dense_2)
out = Dense(1, activation="sigmoid")(lstm)
model = Model(inputs=[feature_input,series_input], outputs=out)
model.compile(loss='mean_squared_error', optimizer='adam', metrics=["mape"])
但是,我只是不确定如何正确指定列表中的初始状态.我明白了
however, I am just not sure on how to specify the initial state on the list correctly. I get
ValueError: An initial_state was passed that is not compatible with `cell.state_size`. Received `state_spec`=[<keras.engine.topology.InputSpec object at 0x11691d518>]; However `cell.state_size` is (8, 8)
我可以看到是由 3d 批次维度引起的.我尝试使用 Flatten、Permutation 和 Resize 图层,但我认为这是不正确的.我缺少什么以及如何连接这些层?
which I can see is caused by the 3d batch dimension. I tried using Flatten, Permutation, and Resize layers but I don't believe that is correct. What am I missing and how can I connect these layers?
推荐答案
第一个问题是 LSTM(8)
层需要两个初始状态 h_0
和 >c_0
,每个维度(None, 8)
.这就是错误消息中cell.state_size
is (8, 8)"的含义.
The first problem is that an LSTM(8)
layer expects two initial states h_0
and c_0
, each of dimension (None, 8)
. That's what it means by "cell.state_size
is (8, 8)" in the error message.
如果你只有一个初始状态dense_2
,也许你可以切换到GRU
(它只需要h_0
).或者,您可以将 feature_input
转换为两个初始状态.
If you only have one initial state dense_2
, maybe you can switch to GRU
(which requires only h_0
). Or, you can transform your feature_input
into two initial states.
第二个问题是 h_0
和 c_0
的形状是 (batch_size, 8)
,但是你的 dense_2
> 的形状为 (batch_size, timesteps, 8)
.在使用 dense_2
作为初始状态之前,您需要处理时间维度.
The second problem is that h_0
and c_0
are of shape (batch_size, 8)
, but your dense_2
is of shape (batch_size, timesteps, 8)
. You need to deal with the time dimension before using dense_2
as initial states.
因此,也许您可以将输入形状更改为 (data.training_features.shape[1],)
或使用 GlobalAveragePooling1D
对时间步长求平均值.
So maybe you can change your input shape into (data.training_features.shape[1],)
or take average over timesteps with GlobalAveragePooling1D
.
一个有效的例子是:
feature_input = Input(shape=(5,))
dense_1_h = Dense(4, activation='relu')(feature_input)
dense_2_h = Dense(8, activation='relu')(dense_1_h)
dense_1_c = Dense(4, activation='relu')(feature_input)
dense_2_c = Dense(8, activation='relu')(dense_1_c)
series_input = Input(shape=(None, 5))
lstm = LSTM(8)(series_input, initial_state=[dense_2_h, dense_2_c])
out = Dense(1, activation="sigmoid")(lstm)
model = Model(inputs=[feature_input,series_input], outputs=out)
model.compile(loss='mean_squared_error', optimizer='adam', metrics=["mape"])
这篇关于来自密集层的 LSTM 初始状态的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!