3矢量序列LSTM不能超过0.5精度 [英] 3-vector series LSTM can't break 0.5 accuracy

查看:81
本文介绍了3矢量序列LSTM不能超过0.5精度的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个玩具玩具数据集,其中包含3个向量,形式为

I have a toy series dataset of 3-vectors in the form of

[[0, 0, 2], [1, 0, 3], [2, 0, 4], [3, 0, 2], [4, 0, 3], [5, 0, 4] ... [10001, 0, 4]]

x总是上升1,y总是0,z重复2、3、4.我想在给定起始序列的情况下预测序列中的下一个3向量.我使用的窗口大小为32,但也尝试了256次,但结果相同.

x always goes up by one, y is always 0, z repeats 2, 3, 4. I want to predict the next 3-vector in the sequence given a starting sequence. I'm using a window size of 32, but have also tried 256 with identical results.

在将其发送到模型之前,我将每个尺寸归一化为0到1之间.无论我添加多少个图层,单位或数量的要素,该模型都不会比约0.5的精度更高,我想了解原因.

I normalize each dimension to be between 0 and 1 before sending it into the model. No matter how many layers, units, of number of features I add, the model doesn't get more accurate than about 0.5 and I'd like to understand why.

我对第33个项目的预测是[4973.29 0.000 3.005],而实际值是[32 0 4],我不知道这是因为0.5的精度还是其他原因而导致的错误.

The prediction I get for the 33rd item is [4973.29 0.000 3.005] whereas the real value is [32 0 4] and I don't know if that's wrong because of the 0.5 accuracy or because of something else.

我的模型如下:

# X_modified shape: (9970, 32, 3)
# Y_modified shape: (9970, 3)

model = Sequential()

model.add(LSTM(units=128, input_shape=(X_modified.shape[1], X_modified.shape[2]), return_sequences=True))
model.add(Dropout(0.2))

model.add(LSTM(units=128, return_sequences=True))
model.add(Dropout(0.2))

model.add(LSTM(units=128))
model.add(Dropout(0.2))

model.add(Dense(Y_modified.shape[1], activation='softmax'))

model.compile(loss='categorical_crossentropy', metrics=['accuracy'], optimizer='adam')

以下是摘要和图表:

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
lstm_204 (LSTM)              (None, 32, 128)           67584     
_________________________________________________________________
dropout_199 (Dropout)        (None, 32, 128)           0         
_________________________________________________________________
lstm_205 (LSTM)              (None, 32, 128)           131584    
_________________________________________________________________
dropout_200 (Dropout)        (None, 32, 128)           0         
_________________________________________________________________
lstm_206 (LSTM)              (None, 128)               131584    
_________________________________________________________________
dropout_201 (Dropout)        (None, 128)               0         
_________________________________________________________________
dense_92 (Dense)             (None, 3)                 387       
=================================================================
Total params: 331,139
Trainable params: 331,139
Non-trainable params: 0
_________________________________________________________________
None

任何见识都将受到感激,谢谢!

Any insight is greatly appreciated, thank you!

推荐答案

现在,您的模型已设置为分类器,但是从您的描述看来,您似乎正在尝试解决回归问题.让我知道我是否误会了.

Right now your model is set up as a classifier but from your description it seems you are trying to solve a regression problem. Let me know if I am misunderstanding.

尝试将最终密集层上的激活更改为线性". 还将损失函数更改为"mean_squared_error"或其他回归损失. https://keras.io/losses/

Try changing the activation on the final dense layer to 'linear'. Also change the loss function to 'mean_squared_error' or another regression loss. https://keras.io/losses/

您将无法获得回归问题的准确性评分,而是会看到均方误差和您添加的其他任何回归指标,例如"mae"作为均值平均误差,这对于更易于理解的误差很有用数字.

You will not be able to get an accuracy score on a regression problem, instead you will see the mean squared error and any other regression metrics you add like 'mae' for mean average error which is useful for a more human readable error number.

您应该可以使用小型网络来解决此问题,因此不必增加层数和单位数.

You should be able to solve this with a small network so increasing the number of layers and units is not necessary.

针对您的评论

如果时序不互相影响,那么实际上没有任何理由同时预测它们,因此您必须先决定.这是将它们更改为所需分类问题的方法.

If the timseries don't interact with each other then there isnt really any reason to predict them at the same time, so you'll have to decide that first. Here is how you could change them to classification problems of you want.

根据您的描述,我看不到将X轴归类为分类问题的方法,因为它只是一个越来越多的数字.

Based on you description I can't see a way to frame the X axis as a classification problem since it is just an increasing number.

对于Y轴,您可以让网络预测下一个点是否为零.因此,您希望该轴的标签为0或1,具体取决于该点是否为0.最终的激活将是具有1个单位和S形激活的致密层.但是,如果非零值的出现是完全随机的,则不可能准确预测.

For the Y axis you could have the network predict whether the next point will be a zero or not. So you would want the labels for this axis to be either 0 or 1 depending on whether the point is 0 or not. The final activation would be a dense layer with 1 unit and sigmoid activation. However if the occurrences of non zero values is completely random then it would be impossible to accurately predict.

对于Z轴,您可以将其框架化为多类分类问题.您的标签宽度为3,其中正确的数字是一个热编码.因此,如果下一个Z轴值为2,则您的标签将为[1、0、0].最后一层应该是3个单元的密集层.激活应该是softmax,因为您希望它选择3个选项中的1个,这与S型激活有关,后者可以选择这3个选项的任意组合.

For the Z axis you could frame it as a multiclass classification problem. Your labels would have a width 3 where the correct number is one hot encoded. So if the next Z axis value was 2 then your labels would be [1, 0, 0]. The final layer should be a dense layer with 3 units. The activation should be softmax because you want it to select 1 of the 3 options, as apposed to a sigmoid activation which could select any combination of the three.

如果使用Kerases功能模型API进行多输出,则可以在一个网络中预测所有这些信息.

You could predict these all in one network if you used Kerases functional model API to do multi-output.

这篇关于3矢量序列LSTM不能超过0.5精度的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆