MICE不会估算某些列,但也不会给出错误 [英] MICE does not impute certain columns, but also does not give an error
问题描述
I know that similar questions have been asked before (e.g., 1, 2, 3), but I still can not understand the reason why MICE is failing to predict missing values even when I try unconditioned mean like in the example 1.
我的稀疏矩阵是:
k1 k3 k5 k6 k7 k8 k11 k12 k13 k14 k15
[1,] NA NA NA NA NA NA NA NA NA NA 0.066667
[2,] 0.909091 NA NA NA NA 0.944723 NA NA 0.545455 NA NA
[3,] 0.545455 NA NA NA NA NA NA NA 0.818182 0.800000 0.466667
[4,] 0.545455 NA 0.642857 NA NA 0.260954 NA NA NA NA NA
[5,] NA 0.750 0.500000 NA 0.869845 NA 0.595013 NA NA NA NA
[6,] 0.727273 0.625 NA 0.583333 NA NA NA 0.500000 0.545455 NA NA
[7,] NA NA 0.571429 NA NA NA NA NA NA NA 0.866667
[8,] 0.545455 NA NA NA NA 0.905593 0.677757 NA NA NA NA
[9,] NA 0.999 0.714286 0.750000 NA NA 0.881032 NA NA 0.933333 0.733333
[10,] NA 0.750 NA NA NA NA NA NA 0.545455 NA NA
[11,] NA NA NA NA NA NA NA NA 0.818182 NA NA
[12,] NA 0.999 NA 0.583333 NA NA 0.986145 0.666667 0.909091 NA NA
[13,] 0.818182 NA 0.857143 0.583333 0.001000 NA NA NA NA 0.133333 NA
[14,] NA 0.999 0.357143 NA 0.635087 NA NA NA NA NA NA
[15,] NA 0.750 0.857143 0.250000 0.742082 0.001000 0.001000 NA 0.636364 NA 0.533333
[16,] NA 0.999 NA 0.250000 NA NA NA NA 0.909091 NA NA
[17,] 0.727273 0.999 0.001000 NA NA NA 0.886366 0.666667 0.909091 0.800000 0.933333
[18,] NA NA 0.571429 NA NA 0.953382 NA 0.833333 0.727273 NA NA
[19,] NA NA NA NA 0.661476 NA NA 0.500000 NA 0.933333 0.600000
[20,] NA NA 0.857143 NA 0.661661 0.459014 0.283793 NA NA NA NA
[21,] NA NA NA NA NA NA NA NA NA NA 0.800000
[22,] 0.454545 NA NA NA NA NA NA 0.333333 0.727273 NA 0.533333
[23,] NA NA NA 0.333333 0.790737 NA NA NA 0.727273 0.433333 NA
[24,] NA 0.875 NA NA NA NA NA NA NA 0.999000 NA
[25,] NA NA 0.571429 0.583333 NA NA 0.196147 0.500000 NA NA NA
[26,] NA 0.999 0.642857 0.250000 NA NA NA NA 0.636364 0.700000 NA
[27,] NA NA 0.714286 NA NA NA NA NA NA NA NA
[28,] NA 0.875 NA 0.500000 NA NA NA NA NA NA 0.666667
[29,] 0.636364 0.750 NA NA NA 0.999000 0.999000 NA NA NA NA
[30,] 0.727273 NA NA NA 0.916098 0.734748 NA NA NA 0.833333 NA
[31,] NA NA NA NA NA NA NA NA NA NA 0.733333
[32,] NA 0.875 NA 0.500000 NA NA NA NA 0.818182 NA NA
[33,] 0.636364 NA NA NA NA NA 0.829819 NA 0.727273 NA 0.733333
[34,] NA NA 0.500000 NA NA NA NA NA NA NA 0.666667
[35,] NA NA 0.214286 NA NA 0.529592 NA 0.001000 0.909091 NA NA
[36,] NA NA NA 0.416667 0.808369 NA NA 0.500000 0.909091 0.633333 0.733333
[37,] NA NA 0.357143 NA NA 0.837555 0.755077 NA 0.818182 NA NA
[38,] NA NA NA 0.166667 0.841643 0.364216 NA NA NA 0.733333 NA
[39,] NA NA 0.500000 0.750000 NA NA NA NA 0.818182 0.999000 0.800000
[40,] NA NA NA NA 0.931836 NA NA NA NA NA 0.133333
[41,] NA NA 0.714286 NA NA 0.848688 NA NA NA NA NA
[42,] NA NA 0.214286 0.333333 0.700812 0.208412 NA 0.333333 NA NA NA
[43,] 0.454545 NA NA NA 0.109326 0.346767 0.877241 0.833333 NA NA NA
[44,] 0.818182 NA 0.857143 NA NA 0.931636 NA NA NA 0.733333 NA
[45,] 0.363636 0.750 NA NA NA NA NA 0.166667 0.818182 NA NA
[46,] NA NA 0.785714 NA 0.738672 NA NA NA NA 0.100000 NA
[47,] 0.181818 NA NA NA NA NA NA NA NA NA 0.001000
[48,] NA NA 0.001000 0.083333 0.308050 0.139592 NA 0.166667 NA NA NA
[49,] NA NA NA NA 0.561841 0.817696 NA 0.666667 NA 0.300000 NA
[50,] NA NA NA 0.416667 NA NA NA NA 0.545455 NA 0.866667
[51,] NA 0.875 NA NA 0.039781 NA NA NA NA 0.933333 NA
[52,] NA NA 0.357143 NA NA NA NA 0.333333 NA NA NA
[53,] NA 0.999 NA NA NA 0.835015 NA NA NA 0.833333 0.666667
[54,] NA 0.750 NA 0.416667 NA NA 0.623528 0.333333 0.818182 NA NA
[55,] NA NA NA 0.666667 NA 0.878312 NA NA NA NA NA
然后我应用以下标准鼠标功能
And I apply the following standard mice function
res<-mice(Sparse_Data,maxit = 30,meth='mean',seed = 500,print=FALSE)
t<-complete(res, action="long",TRUE) #all theestimations in 10 itterations
out <- split( t , f = t$.imp )[-1]
a<-Reduce("+", out)/length(out)
data_Pred<-a[,3:ncol(a)]
我得到的预测矩阵是:
k1 k3 k5 k6 k7 k8 k11 k12 k13 k14 k15
56 0.6060607 0.8676667 0.5373542 0.4429824 0.6069598 0.6313629 NA 0.4583958 0.7561986 0.6959606 0.066667
57 0.9090910 0.8676667 0.5373542 0.4429824 0.6069598 0.9447230 NA 0.4583958 0.5454550 0.6959606 NA
58 0.5454550 0.8676667 0.5373542 0.4429824 0.6069598 0.6313629 NA 0.4583958 0.8181820 0.8000000 0.466667
59 0.5454550 0.8676667 0.6428570 0.4429824 0.6069598 0.2609540 NA 0.4583958 0.7561986 0.6959606 NA
60 0.6060607 0.7500000 0.5000000 0.4429824 0.8698450 0.6313629 0.595013 0.4583958 0.7561986 0.6959606 NA
61 0.7272730 0.6250000 0.5373542 0.5833330 0.6069598 0.6313629 NA 0.5000000 0.5454550 0.6959606 NA
62 0.6060607 0.8676667 0.5714290 0.4429824 0.6069598 0.6313629 NA 0.4583958 0.7561986 0.6959606 0.866667
63 0.5454550 0.8676667 0.5373542 0.4429824 0.6069598 0.9055930 0.677757 0.4583958 0.7561986 0.6959606 NA
64 0.6060607 0.9990000 0.7142860 0.7500000 0.6069598 0.6313629 0.881032 0.4583958 0.7561986 0.9333330 0.733333
65 0.6060607 0.7500000 0.5373542 0.4429824 0.6069598 0.6313629 NA 0.4583958 0.5454550 0.6959606 NA
66 0.6060607 0.8676667 0.5373542 0.4429824 0.6069598 0.6313629 NA 0.4583958 0.8181820 0.6959606 NA
67 0.6060607 0.9990000 0.5373542 0.5833330 0.6069598 0.6313629 0.986145 0.6666670 0.9090910 0.6959606 NA
68 0.8181820 0.8676667 0.8571430 0.5833330 0.0010000 0.6313629 NA 0.4583958 0.7561986 0.1333330 NA
69 0.6060607 0.9990000 0.3571430 0.4429824 0.6350870 0.6313629 NA 0.4583958 0.7561986 0.6959606 NA
70 0.6060607 0.7500000 0.8571430 0.2500000 0.7420820 0.0010000 0.001000 0.4583958 0.6363640 0.6959606 0.533333
71 0.6060607 0.9990000 0.5373542 0.2500000 0.6069598 0.6313629 NA 0.4583958 0.9090910 0.6959606 NA
72 0.7272730 0.9990000 0.0010000 0.4429824 0.6069598 0.6313629 0.886366 0.6666670 0.9090910 0.8000000 0.933333
73 0.6060607 0.8676667 0.5714290 0.4429824 0.6069598 0.9533820 NA 0.8333330 0.7272730 0.6959606 NA
74 0.6060607 0.8676667 0.5373542 0.4429824 0.6614760 0.6313629 NA 0.5000000 0.7561986 0.9333330 0.600000
75 0.6060607 0.8676667 0.8571430 0.4429824 0.6616610 0.4590140 0.283793 0.4583958 0.7561986 0.6959606 NA
76 0.6060607 0.8676667 0.5373542 0.4429824 0.6069598 0.6313629 NA 0.4583958 0.7561986 0.6959606 0.800000
77 0.4545450 0.8676667 0.5373542 0.4429824 0.6069598 0.6313629 NA 0.3333330 0.7272730 0.6959606 0.533333
78 0.6060607 0.8676667 0.5373542 0.3333330 0.7907370 0.6313629 NA 0.4583958 0.7272730 0.4333330 NA
79 0.6060607 0.8750000 0.5373542 0.4429824 0.6069598 0.6313629 NA 0.4583958 0.7561986 0.9990000 NA
80 0.6060607 0.8676667 0.5714290 0.5833330 0.6069598 0.6313629 0.196147 0.5000000 0.7561986 0.6959606 NA
81 0.6060607 0.9990000 0.6428570 0.2500000 0.6069598 0.6313629 NA 0.4583958 0.6363640 0.7000000 NA
82 0.6060607 0.8676667 0.7142860 0.4429824 0.6069598 0.6313629 NA 0.4583958 0.7561986 0.6959606 NA
83 0.6060607 0.8750000 0.5373542 0.5000000 0.6069598 0.6313629 NA 0.4583958 0.7561986 0.6959606 0.666667
84 0.6363640 0.7500000 0.5373542 0.4429824 0.6069598 0.9990000 0.999000 0.4583958 0.7561986 0.6959606 NA
85 0.7272730 0.8676667 0.5373542 0.4429824 0.9160980 0.7347480 NA 0.4583958 0.7561986 0.8333330 NA
86 0.6060607 0.8676667 0.5373542 0.4429824 0.6069598 0.6313629 NA 0.4583958 0.7561986 0.6959606 0.733333
87 0.6060607 0.8750000 0.5373542 0.5000000 0.6069598 0.6313629 NA 0.4583958 0.8181820 0.6959606 NA
88 0.6363640 0.8676667 0.5373542 0.4429824 0.6069598 0.6313629 0.829819 0.4583958 0.7272730 0.6959606 0.733333
89 0.6060607 0.8676667 0.5000000 0.4429824 0.6069598 0.6313629 NA 0.4583958 0.7561986 0.6959606 0.666667
90 0.6060607 0.8676667 0.2142860 0.4429824 0.6069598 0.5295920 NA 0.0010000 0.9090910 0.6959606 NA
91 0.6060607 0.8676667 0.5373542 0.4166670 0.8083690 0.6313629 NA 0.5000000 0.9090910 0.6333330 0.733333
92 0.6060607 0.8676667 0.3571430 0.4429824 0.6069598 0.8375550 0.755077 0.4583958 0.8181820 0.6959606 NA
93 0.6060607 0.8676667 0.5373542 0.1666670 0.8416430 0.3642160 NA 0.4583958 0.7561986 0.7333330 NA
94 0.6060607 0.8676667 0.5000000 0.7500000 0.6069598 0.6313629 NA 0.4583958 0.8181820 0.9990000 0.800000
95 0.6060607 0.8676667 0.5373542 0.4429824 0.9318360 0.6313629 NA 0.4583958 0.7561986 0.6959606 0.133333
96 0.6060607 0.8676667 0.7142860 0.4429824 0.6069598 0.8486880 NA 0.4583958 0.7561986 0.6959606 NA
97 0.6060607 0.8676667 0.2142860 0.3333330 0.7008120 0.2084120 NA 0.3333330 0.7561986 0.6959606 NA
98 0.4545450 0.8676667 0.5373542 0.4429824 0.1093260 0.3467670 0.877241 0.8333330 0.7561986 0.6959606 NA
99 0.8181820 0.8676667 0.8571430 0.4429824 0.6069598 0.9316360 NA 0.4583958 0.7561986 0.7333330 NA
100 0.3636360 0.7500000 0.5373542 0.4429824 0.6069598 0.6313629 NA 0.1666670 0.8181820 0.6959606 NA
101 0.6060607 0.8676667 0.7857140 0.4429824 0.7386720 0.6313629 NA 0.4583958 0.7561986 0.1000000 NA
102 0.1818180 0.8676667 0.5373542 0.4429824 0.6069598 0.6313629 NA 0.4583958 0.7561986 0.6959606 0.001000
103 0.6060607 0.8676667 0.0010000 0.0833330 0.3080500 0.1395920 NA 0.1666670 0.7561986 0.6959606 NA
104 0.6060607 0.8676667 0.5373542 0.4429824 0.5618410 0.8176960 NA 0.6666670 0.7561986 0.3000000 NA
105 0.6060607 0.8676667 0.5373542 0.4166670 0.6069598 0.6313629 NA 0.4583958 0.5454550 0.6959606 0.866667
106 0.6060607 0.8750000 0.5373542 0.4429824 0.0397810 0.6313629 NA 0.4583958 0.7561986 0.9333330 NA
107 0.6060607 0.8676667 0.3571430 0.4429824 0.6069598 0.6313629 NA 0.3333330 0.7561986 0.6959606 NA
108 0.6060607 0.9990000 0.5373542 0.4429824 0.6069598 0.8350150 NA 0.4583958 0.7561986 0.8333330 0.666667
109 0.6060607 0.7500000 0.5373542 0.4166670 0.6069598 0.6313629 0.623528 0.3333330 0.8181820 0.6959606 NA
110 0.6060607 0.8676667 0.5373542 0.6666670 0.6069598 0.8783120 NA 0.4583958 0.7561986 0.6959606 NA
也许有人可以阐明这个问题?
Maybe someone can shed some light on the problem?
推荐答案
好的,所以这是交易... mice
依赖于它的PredictionMatrix
.这是一个矩阵,用于确定从哪个列预测每个变量的缺失值.如果列为空,则无论您指定哪种方法,都不会预测该变量.
Ok, so here's the deal... mice
relies on its PredictionMatrix
. This is a matrix that is used to determine from which columns the missing values of each variable are predicted. If a column is empty, then that variable will not be predicted, regardless of what method you specify.
您可以通过运行mice
然后键入res$pred
来检查此矩阵.如您所见,k11
和k15
的列为空,因此不进行插补.仅作为示例(没有解决方案),尝试指定mice(pred = diag(ncol(Sparse_Data)), ...)
.您会看到它现在可以正常工作了.
You can check this matrix by running mice
and then typing res$pred
. As you can see, the columns for k11
and k15
are empty and therefore they aren't imputed. Purely as an example (NOT A SOLUTION), try specifying mice(pred = diag(ncol(Sparse_Data)), ...)
. You'll see that now it works.
那么为什么mice
将这两列留空?好吧,我尝试查看mice
的源代码...在其中,有一个名为check.data
的函数.其中,有一个find.collinear
调用,该调用将依次指定哪些变量是共线的,然后将在后续步骤中将其删除.
So why does mice
make those two columns empty? Well, I tried looking into the source code of mice
... Within it, there is a function called check.data
. Within that, there is a call to find.collinear
, which in turn will specify which variables are collinear, which will then be removed in subsequent steps.
您的任何列是否共线?好吧,是的:
Are any of your columns collinear? Well, yes:
cor(Sparse_Data, use = "pairwise.complete.obs")
k1 k3 k5 k6 k7 k8 k11 k12 k13 k14 k15
k1 1.0000000 1.740412e-01 0.24932705 NA 0.17164319 0.640984131 0.3053596 0.4225772 -0.536055739 -0.50460872 0.97321365
k3 0.1740412 1.000000e+00 -0.42409199 -9.370804e-05 -0.38583663 0.361416106 0.5515156 0.6567106 0.634250161 -0.70631658 0.74001342
k5 0.2493271 -4.240920e-01 1.00000000 4.471829e-01 0.02679894 0.234850334 -0.6624768 0.4201946 -0.924517670 -0.45408744 -0.78628746
k6 NA -9.370804e-05 0.44718290 1.000000e+00 -0.35377747 0.818644775 0.6824749 0.8899878 0.147657537 0.27030472 0.49159991
k7 0.1716432 -3.858366e-01 0.02679894 -3.537775e-01 1.00000000 0.207791538 -0.6406942 -0.2863018 0.898687181 0.14987951 -0.70210859
k8 0.6409841 3.614161e-01 0.23485033 8.186448e-01 0.20779154 1.000000000 0.7491736 0.5219197 0.002468839 -0.13067177 1.00000000
k11 0.3053596 5.515156e-01 -0.66247684 6.824749e-01 -0.64069422 0.749173578 1.0000000 0.5925582 0.830372468 -1.00000000 0.83452358
k12 0.4225772 6.567106e-01 0.42019459 8.899878e-01 -0.28630180 0.521919747 0.5925582 1.0000000 -0.134937885 -0.49251775 0.92582043
k13 -0.5360557 6.342502e-01 -0.92451767 1.476575e-01 0.89868718 0.002468839 0.8303725 -0.1349379 1.000000000 0.29508347 0.13853862
k14 -0.5046087 -7.063166e-01 -0.45408744 2.703047e-01 0.14987951 -0.130671767 -1.0000000 -0.4925177 0.295083470 1.00000000 0.02558161
k15 0.9732137 7.400134e-01 -0.78628746 4.915999e-01 -0.70210859 1.000000000 0.8345236 0.9258204 0.138538625 0.02558161 1.00000000
如您所见,k11
与k14
完美相关,而k15
与k8
完美相关.这就是为什么他们被踢出去的原因.
As you can see, k11
is perfectly correlated with k14
, and k15
with k8
. This is why they get kicked out.
因此,有两种解决方案...要么确保矩阵中没有完全相关的对,要么在这种情况下,请自己提供PredictionMatrix
.
So, there are two solutions... either make sure that there are no perfectly correlated pairs in your matrix, or in this case just provide PredictionMatrix
yourself.
进一步证明我的观点.尝试在代码之前运行此代码,您会发现它确实有效:
To further prove my point.. Try running this code before your code and you'll see that it indeed works:
Sparse_Data$k11[1] <- 2
Sparse_Data$k15[1] <- 2
Sparse_Data$k8[1] <- 0.5
Sparse_Data$k14[1] <- 0.5
这篇关于MICE不会估算某些列,但也不会给出错误的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!