R KNN归因-返回错误结果的函数&缺少帮助页面 [英] R KNN imputation - function returning erroneous results & missing help page

查看:89
本文介绍了R KNN归因-返回错误结果的函数&缺少帮助页面的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试使用library(imputation)kNNImpute()来估算R中的一些缺失值.输入数据帧是44行的13个变量.在2列中有30个完整的观测值和14个观测值缺失的值.

I'm trying to impute some missing values in R using library(imputation) and kNNImpute(). The input data frame is 44 rows of 13 variables. There are 30 complete observations and 14 observations with missing values in 2 columns.

代码说这是在插补所有缺失的值;但是,它将最后4个值归为0.根据我对代码的阅读,这似乎是基于使用0作为错误默认值的缺陷.我的代码:

The code is saying it's imputing all the missing values; however, it's imputing the last 4 values as 0. From my reading of the code, this appears to be a flaw based on using 0 as a default for errors. My code:

# impute data
library(imputation)
knn_data <- kNNImpute(x, k= 5)

# examine kNNImpute code
kNNImpute

kNNImpute的代码:请参见第4、8行,该函数从第24行开始,并从底部开始的第二行(第48行):

kNNImpute's code: See lines 4, 8 the function starting on line 24 and the 2nd line from the bottom (line 48):

[4]  prelim = impute.prelim(x)
[8]  x.missing = prelim$x.missing
[24] x.missing.imputed = t(apply(x.missing, 1, function(i) {...}
[48] x[missing.matrix2] = 0

??impute.prelim不返回任何结果(缺少帮助页面).因此,我无法检查此代码.

??impute.prelim returns no results (the help page is missing). So, I can't examine this code.

但是,kNNImpute的程序流程似乎是

However, the program flow for kNNImpute appears to be

[4]  # run a (seemingly undefined) screening function
[8]  # pull in the missing rows for later imputation
[24] # run imputation function
[48] # based on line [4] output, impute all "error rows"  == 0

谁能解释为什么会这样和/或如何解决这个问题?

Can anyone explain why this is happening and/or how to solve this problem?

仅供参考-我已通过电子邮件向软件包作者发送了指向此页面的链接.

FYI- I have emailed the package author a link to this page.

推荐答案

解决方案:我使用了与kNNImpute()函数相同的代码来估算4个不正确的估算值.

Solution: I used code identical to the kNNImpute() function to impute the 4 improperly imputed values.

impute.fn <- function(scores, distances, raw_dist) {
  knn.values <- scores[c(as.integer(names(distances)))]
  knn.weights <- 1 - (distances / max(raw_dist))
  weighted.mean(knn.values, knn.weights)
}

# impute errors - rows 41-44 are improperly imputed 
  # rows 1-30 have non missing avlues
#---------------------------------------------------------
x.dist <- as.matrix(dist(x))
dist_41 <- x.dist[41, c(1:30)][order(x.dist[41, c(1:30)])]
...

# fix impute - column 1
x$ABC[41] <- impute.fn(x$ABC, dist_41[1:5], dist_41)
...

仍然可以感谢软件包作者(或其他)提供的适当答案.

An appropriate answer from the package author (or other) would still be appreciated.

注意:我已经为wKNN重新编写了imputation程序包.可以在以下位置找到改进的软件包: imputaton

Note: I have re-written the imputation package for wKNN. Improved package can be found here: imputaton

这篇关于R KNN归因-返回错误结果的函数&amp;缺少帮助页面的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆