如何以MSE最小化的方式收敛权重? [英] How can I get weights converged in a way that MSE minimizes?

查看:126
本文介绍了如何以MSE最小化的方式收敛权重?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

这是我的代码

for _ in range(5):
    K.clear_session()
    model = Sequential()

    model.add(LSTM(256, input_shape=(None, 1)))
    model.add(Dropout(0.2))

    model.add(Dense(256))
    model.add(Dropout(0.2))

    model.add(Dense(1))

    model.compile(loss='mean_squared_error', optimizer='RmsProp', metrics=['accuracy'])
    hist = model.fit(x_train, y_train, epochs=20, batch_size=64, verbose=0, validation_data=(x_val, y_val))


    p = model.predict(x_test)
    print(mean_squared_error(y_test, p))


    plt.plot(y_test)
    plt.plot(p)
    plt.legend(['testY', 'p'], loc='upper right')
    plt.show()

Total params:330,241 samples:2264

Total params : 330,241 samples : 2264

以下是结果

我什么都没改变.

我只跑了循环.

如图所示,即使我刚刚运行了for循环,MSE的结果还是巨大的.

As you can see in the picture, the result of the MSE is huge, even though I have just run the for loop.

我认为此问题的根本原因是优化器无法找到全局最大值,而无法找到局部最大值并收敛.原因是在检查所有损耗图之后,损耗不再显着降低. (经过20次)因此,为了解决此问题,我必须找到全局最小值.我该怎么办?

I think the fundamental reason for this problem is that the optimizer can not find global maximum and find local maximum and converge. The reason is that after checking all the loss graphs, the loss is no longer reduced significantly. (After 20 times) So in order to solve this problem, I have to find the global minimum. How should I do this?

我尝试调整epatch_size的数量.另外,我尝试了隐藏层大小,LSTM单位,kerner_initializer添加,优化器更改等,但是无法获得任何有意义的结果.

I tried adjusting the number of batch_size, epoch. Also, I tried hidden layer size, LSTM unit, kerner_initializer addition, optimizer change, etc. but could not get any meaningful result.

我想知道如何解决这个问题.

I wonder how can I solve this problem.

您的宝贵意见和想法将不胜感激.

Your valuable opinions and thoughts will be very much appreciated.

如果您想在此处查看完整的源代码,请链接 https://gist.github.com/Lay4U /e1fc7d036356575f4d0799cdcebed90e

if you want to see full source here is link https://gist.github.com/Lay4U/e1fc7d036356575f4d0799cdcebed90e

推荐答案

在您的示例中,问题仅出自以下事实:参数比示例多100倍.如果减小模型的大小,则差异会减小.

From your example, the problem simply comes from the fact that you have over 100 times more parameters than you have samples. If you reduce the size of your model, you will see less variance.

您要提出的更广泛的问题实际上是非常有趣的,通常在教程中没有涉及.几乎所有的机器学习模型本质上都是随机的,每次运行时输出预测都会略有变化,这意味着您将总是要问一个问题:我将哪种模型部署到生产环境?

The wider question you are asking is actually very interesting that usually isn't covered in tutorials. Nearly all Machine Learning models are by nature stochastic, the output predictions will change slightly everytime you run it which means you will always have to ask the question: Which model do I deploy to production ?

您可以做两件事:

  • 选择第一个对所有数据进行训练的模型(在交叉验证之后,...)
  • 构建所有具有相同超参数的模型并实施简单的投票策略

参考文献:

  • https://machinelearningmastery.com/train-final-machine-learning-model/
  • https://machinelearningmastery.com/randomness-in-machine-learning/

这篇关于如何以MSE最小化的方式收敛权重?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆