验证损失比培训损失高得多 [英] Validation Loss Much Higher Than Training Loss

查看:165
本文介绍了验证损失比培训损失高得多的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我对深度学习模型非常陌生,并尝试使用带有Keras Sequential的LSTM训练多时间序列模型.在50年中,每年有25个观测值= 1250个样本,因此不确定是否可以使用LSTM获得如此小的数据.但是,我有成千上万个功能变量,不包括时滞.我正在尝试预测数据的下25个时间步的顺序.数据在0到1之间进行了归一化.我的问题是,尽管尝试了许多明显的调整,但我无法在接近训练损失的位置获得LSTM验证损失(我认为过拟合过大).

I'm very new to deep learning models, and trying to train a multiple time series model using LSTM with Keras Sequential. There are 25 observations per year for 50 years = 1250 samples, so not sure if this is even possible to use LSTM for such small data. However, I have thousands of feature variables, not including time lags. I'm trying to predict a sequence of the next 25 time steps of data. The data is normalized between 0 and 1. My problem is that, despite trying many obvious adjustments, I cannot get the LSTM validation loss anywhere close to the training loss (overfitting dramatically, I think).

我尝试调整每个隐藏层的节点数(25-375),隐藏层数(1-3),辍学(0.2-0.8),batch_size(25-375)和训练/测试拆分(90%:10%-50%-50%).验证损失/训练损失差异方面并没有多大差异.

I have tried adjusting number of nodes per hidden layer (25-375), number of hidden layers (1-3), dropout (0.2-0.8), batch_size (25-375), and train/ test split (90%:10% - 50%-50%). Nothing really makes much of a difference on the validation loss/ training loss disparity.

# SPLIT INTO TRAIN AND TEST SETS
# 25 observations per year; Allocate 5 years (2014-2018) for Testing
n_test = 5 * 25
test = values[:n_test, :]
train = values[n_test:, :]

# split into input and outputs
train_X, train_y = train[:, :-25], train[:, -25:]
test_X, test_y = test[:, :-25], test[:, -25:]

# reshape input to be 3D [samples, timesteps, features]
train_X = train_X.reshape((train_X.shape[0], 5, newdf.shape[1]))
test_X = test_X.reshape((test_X.shape[0], 5, newdf.shape[1]))
print(train_X.shape, train_y.shape, test_X.shape, test_y.shape)


# design network
model = Sequential()
model.add(Masking(mask_value=-99, input_shape=(train_X.shape[1], train_X.shape[2])))
model.add(LSTM(375, return_sequences=True))
model.add(Dropout(0.8))
model.add(LSTM(125, return_sequences=True))
model.add(Dropout(0.8))
model.add(LSTM(25))
model.add(Dense(25))
model.compile(loss='mse', optimizer='adam')
# fit network
history = model.fit(train_X, train_y, epochs=20, batch_size=25, validation_data=(test_X, test_y), verbose=2, shuffle=False)

第19/20集-14秒-损失:0.0512-val_loss:188.9568

Epoch 19/20 - 14s - loss: 0.0512 - val_loss: 188.9568

第20/20集-14秒-损失:0.0510-损失值:188.9537

Epoch 20/20 - 14s - loss: 0.0510 - val_loss: 188.9537

链接到Val损失/火车损失图

我认为我肯定在做某些明显的错误,但由于我是新手,所以无法意识到.我希望要么获得一些有用的验证损失(与培训相比),要么希望我的数据观察结果不足以用于有用的LSTM建模.任何帮助或建议都非常感谢,谢谢!

I assume I must be doing something obvious wrong, but can't realize it since I'm a newbie. I am hoping to either get some useful validation loss achieved (compared to training), or know that my data observations are simply not large enough for useful LSTM modeling. Any help or suggestions is much appreciated, thanks!

推荐答案

过度拟合

通常,如果您看到的验证损失比训练损失高得多,则表明您的模型过度拟合-它会学习迷信",即在训练数据中偶然碰巧是正确的模式具有现实基础,因此在您的验证数据中并非如此.

Overfitting

In general, if you're seeing much higher validation loss than training loss, then it's a sign that your model is overfitting - it learns "superstitions" i.e. patterns that accidentally happened to be true in your training data but don't have a basis in reality, and thus aren't true in your validation data.

通常这表明您的模型功能太强大",参数太多,无法记住有限数量的训练数据.在您的特定模型中,您尝试从一千个数据点学习近一百万个参数(尝试打印 model.summary())-这是不合理的,学习可以从数据中提取/压缩信息,而不是创建凭空而来.

It's generally a sign that you have a "too powerful" model, too many parameters that are capable of memorizing the limited amount of training data. In your particular model you're trying to learn almost a million parameters (try printing model.summary()) from a thousand datapoints - that's not reasonable, learning can extract/compress information from data, not create it out of thin air.

在构建模型之前,您应该问(并回答!)的第一个问题是关于预期的准确性.您应该有一个合理的下限(什么是简单的基线?对于时间序列预测,例如线性回归可能是一个上限)和一个上限(一个专家在给定相同的输入数据的情况下可以做出什么预测,而别无其他?).

The first question you should ask (and answer!) before building a model is about the expected accuracy. You should have a reasonable lower bound (what's a trivial baseline? For time series prediction, e.g. linear regression might be one) and an upper bound (what could an expert human predict given the same input data and nothing else?).

在很大程度上取决于问题的性质.您真的必须问,此信息是否足够以获得良好的答案?对于使用时间序列预测的许多现实生活中的问题,答案是否定的-这种系统的未来状态取决于许多变量,而这些变量不能仅通过查看历史测量值来确定-要合理地预测下一个值,您需要除了历史价格以外,还引入了许多外部数据.Tukey引用了经典的话:一些数据与对答案的渴望相结合并不能确保可以从给定的数据集中提取出合理的答案."

Much depends on the nature of the problem. You really have to ask, is this information sufficient to get a good answer? For many real life time problems with time series prediction, the answer is no - the future state of such a system depends on many variables that can't be determined by simply looking at historical measurements - to reasonably predict the next value, you need to bring in lots of external data other than the historical prices. There's a classic quote by Tukey: "The combination of some data and an aching desire for an answer does not ensure that a reasonable answer can be extracted from a given body of data."

这篇关于验证损失比培训损失高得多的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆