一次获取多个回归指标 [英] Getting multiple regression metrics at once

查看:82
本文介绍了一次获取多个回归指标的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用ML包进行回归分析,并且我的数据得到了很好的结果. 我现在正尝试一次获取多个指标,就目前而言,我正在按照此处示例的建议进行操作:

这给我测试数据的RMSE很好,但是我也对MSE,MAE,MAPE,R²和Q²感兴趣 因此,我在这里查看了文档:

https://spark.apache.org/docs/2.1.0/api/java/org/apache/spark/ml/evaluation/RegressionEvaluator.html#metricName%28%29

在这里我可以得到RMSE,MSE,MAE和R²,但似乎没有一次可以对它们进行全部计算,仅遍历数据行一次,而不是示例代码所建议的5次需要这样做.

如何实现单遍计算?

然后,缺少MAPE和Q²,理想情况下如何计算另外4个,我又该如何计算它们呢?

致谢

解决方案

通过查看RegressionEvaluator的源代码,我发现它是通过创建一个RegressionMetrics对象来实现的,该对象通过使用来一次计算所有统计信息一个MultivariateStatisticalSummary

现在,再看一下文档,我就能知道验证集上的Q²只是R²,所以原始代码让我了解了.

但是对于MAPE来说,给MultivariateStatisticalSummary的两个术语是不够的,所以我不得不添加一个新的像这样的

if (observation != 0)
  math.abs(observation - prediction) / observation
else
  0

然后MAPE就是这样:

def meanAbsolutePercentageError: Double = {
  summary.mean(2)
}

因此,现在我有了所需的所有指标,并且由于它使用了MultivariateStatisticalSummary,因此我有信心数据集仅处理一次.

I'm working with the ML package for regression purposes and I get good results on my data. I'm now trying to get multiple metrics at once, as right now, I'm doing what is suggested by the examples here: https://spark.apache.org/docs/2.1.0/ml-classification-regression.html

Basically the code in the examples is this:

val  evaluator  =  new  RegressionEvaluator()
  .setLabelCol("label")
  .setPredictionCol("prediction")
  .setMetricName("rmse")
val  rmse  =  evaluator.evaluate(predictions)

This gives me the RMSE for my test data which is fine, but I'm also interested in MSE, MAE, MAPE, R² and Q² I thus looked at the documentation here:

https://spark.apache.org/docs/2.1.0/api/java/org/apache/spark/ml/evaluation/RegressionEvaluator.html#metricName%28%29

where I see that I can get RMSE, MSE, MAE and R² but it does not appear that I can get them computed all at once, going over the data rows only once and not 5 times as the example code would suggest it is needed to do so.

How can I achieve that single pass computation?

Then, there are MAPE and Q² missing, how can I get those computed as well, ideally while computing the 4 others?

Regards

解决方案

Looking at the source code for RegressionEvaluator, I discovered that it is implemented by creating a RegressionMetrics object which computes all statistics at once via the use of a MultivariateStatisticalSummary

Now, looking further at the documentation, I was able to understand that Q² is just R² on the validation set, so the original code got me covered.

But for MAPE, the two terms given to MultivariateStatisticalSummary were not enough, so I had to add a new one like this:

if (observation != 0)
  math.abs(observation - prediction) / observation
else
  0

And then MAPE is just this:

def meanAbsolutePercentageError: Double = {
  summary.mean(2)
}

So now I have all the metrics that I need and because it uses MultivariateStatisticalSummary, I'm confident that the dataset is only processed once.

这篇关于一次获取多个回归指标的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆