在温度预测中,哪个损失函数比MSE更好? [英] Which loss-function is better than MSE in temperature prediction?

查看:299
本文介绍了在温度预测中,哪个损失函数比MSE更好?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我的特征向量大小为1x4098.每个特征向量都对应一个浮点数(温度).在训练中,我有10.000个样本.因此,我的训练集大小为10000x4098,标签为10000x1.我想使用线性回归模型从训练数据中预测温度.我正在使用MSE丢失的3个隐藏层(512、128、32).但是,使用Tensorflow只能获得80%的精度.您能建议我其他损失函数以获得更好的性能吗?

I have a feature vector size of 1x4098. Each feature vector corresponds to a float number (temperature). In training, I have 10.000 samples. Hence, I have training set size of 10000x4098 and the label of 10000x1. I want to use linear regression model to predict temperature from training data. i am using 3 hidden layers (512, 128, 32) with MSE loss. However, I only got 80% accuracy using tensorflow. Could you suggest to me others loss function to get better performance?

推荐答案

让我对损失函数的选择进行理论上的解释.您可能会猜到,这完全取决于数据.

Let me give a rather theoretical explanation on the choice of loss function. As you may guess, it all depends on the data.

MSE具有很好的概率解释:假设分布p(y|x)是高斯:p(y|x) ~ N(mu, sigma),它对应于MLE(最大似然估计器).由于MLE收敛到真实参数值,这意味着在此假设下,找到的最小值很可能是您可能获得的最佳拟合.当然,您可能会发现局部最小值而不是全局最小值,并且还隐含一个假设,即您的训练数据很好地表示了x分布.但是这种不确定性是不可避免的,因此实际上我们只是接受它.

MSE has a nice probabilistic interpretation: it corresponds to MLE (maximum likelihood estimator) under assumption that the distribution p(y|x) is Gaussian: p(y|x) ~ N(mu, sigma). Since MLE converges to the true parameter value, this means is that under this assumption, the found minimum is very likely to be the best fit you can possibly get. Of course, you may find local instead of global minimum, there's also implicit assumption that your training data represent x distribution well. But this kind of uncertainty is inevitable, so realistically we just accept it.

继续前进,假设p(y|x)具有拉普拉斯分布.得出的结论是相同的:如果数据符合此分布,则没有其他损失会比L1损失更好.

Moving on, L1 loss (absolute difference) minimization is equivalent to MLE maximization under assumption that p(y|x) has Laplace distribution. And here's the same conclusion: if the data fits this distribution, no other loss will work better than L1 loss.

哈伯损失没有严格的概率解释(至少我不知道),它大约在L1和L2之间,具体取决于delta的选择.

Huber loss doesn't have strict probability interpretation (at least I'm not aware of it), it's somewhat in between L1 and L2, closer to one or another depending on the choice of delta.

它如何帮助您找到合适的损失函数?首先,这意味着默认情况下没有任何损失比其他损失优越.其次,您越了解数据,就越能确定损失函数的选择是正确的.当然,您可以对所有这些选项进行交叉验证,然后选择最佳选项.但这是进行这种分析的一个很好的理由:当您对数据分发充满信心时,您会看到通过添加新的训练数据和增加模型复杂性而获得的持续改进.否则,该模型很可能永远不会泛化.

How does it help you in finding the right loss function? First of all, this means that no loss is by default superior than other. Secondly, the better you understand the data, the more you can be sure your choice of the loss function is correct. Of course, you can just cross-validate all of these options and select the best one. But here's a good reason to do this kind of analysis: when you are confident in data distribution, you will see steady improvement with adding new training data and increasing model complexity. Otherwise, it's simply possible that the model will never generalize.

这篇关于在温度预测中,哪个损失函数比MSE更好?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆