如何使用缺失值执行RMSE? [英] How to perform RMSE with missing values?

查看:221
本文介绍了如何使用缺失值执行RMSE?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个庞大的数据集,其中包含679行和16列,缺少30%的缺失值.因此,我决定使用来自impute软件包的impute.knn函数来估算这些缺失值,然后得到了一个数据集,其中包含679行16列,但是没有缺失值.

I have a huge dataset with 679 rows and 16 columns with 30 % of missing values. So I decided to impute this missing values with the function impute.knn from the package impute and I got a dataset with 679 rows and 16 columns but without the missing values.

但是现在我想使用RMSE检查准确性,我尝试了2种选择:

But now I want to check the accuracy using the RMSE and I tried 2 options:

  1. 加载软件包hydroGOF并应用rmse函数
  2. sqrt(mean (obs-sim)^2), na.rm=TRUE)
  1. load the package hydroGOF and apply the rmse function
  2. sqrt(mean (obs-sim)^2), na.rm=TRUE)

在两种情况下,我会出现错误:errors in sim .obs: non numeric argument to binary operator.

In two situations I have the error: errors in sim .obs: non numeric argument to binary operator.

之所以发生这种情况,是因为原始数据集包含一个NA值(某些值缺失).

This is happening because the original data set contains an NA value (some values are missing).

如果删除缺失值,如何计算RMSE?然后obssim将具有不同的大小.

How can I calculate the RMSE if I remove the missing values? Then obs and sim will have different sizes.

推荐答案

简单...

sqrt( sum( (df$model - df$measure)^2 , na.rm = TRUE ) / nrow(df) )

显然,假设您的数据框称为df,并且您必须确定 N (即nrow(df)包括两行缺少数据的行;是否要排除 N 观察中的这些?我想是的,所以您可能想使用sum( !is.na(df$measure) )来代替nrow(df)),或者紧跟在@Joshua之后

Obviously assuming your dataframe is called df and you have to decide on your N ( i.e. nrow(df) includes the two rows with missing data; do you want to exclude these from N observations? I'd guess yes, so instead of nrow(df) you probably want to use sum( !is.na(df$measure) ) ) or, following @Joshua just

sqrt( mean( (df$model-df$measure)^2 , na.rm = TRUE ) )

这篇关于如何使用缺失值执行RMSE?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆