多种归因(Amelia或其他mi程序包)中的交互作用术语 [英] interactions terms in multiple imputations (Amelia or other mi packages)

查看:88
本文介绍了多种归因(Amelia或其他mi程序包)中的交互作用术语的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我对多个插补中的交互用语有疑问.我的理解是,估算模型应该包含以后分析中使用的所有信息,包括变量的任何转换或相互作用(Amelia用户指南也做出了这样的陈述).但是,当我在插补中包括交互项int=x1*x2时,int的插补值不等于x1*x2.例如,当我有一个二进制变量x2和一个连续变量x1时,当x2为零时,int应该为零.对于int的推定值,情况并非如此.那么,如何处理多种归因中的相互作用呢?下面是一些说明问题的示例代码.

I have a question about interaction terms in multiple imputations. My understanding is that the imputation model is supposed to include all information that is used in the later analysis including any transformations or interactions of variables (the Amelia user guide also makes this statement). But when I include the interaction term int=x1*x2 in the imputation, the imputed value for int is not equal to x1*x2. For example, when I have a binary variable x2 and a continuous variable x1, int should be zero when x2 is zero. That is not the case for the imputed values of int. So how do I treat interactions in multiple imputations? Below is some example code illustrating the question.

library("Amelia")

n = 100
p.na = 0.1
n.na = ceiling(n*p.na)
set.seed(12345)
# create data
df = data.frame(
    'x1' = rnorm(n),
    'x2' = rbinom(n,1,0.5),
    'int'= NA
)
df$x1[sample(1:100,n.na)]=NA
df$x1[sample(1:100,n.na)]=NA
df$int = with(df,x1*x2)
# impute
df.mi = amelia(df,m=2,noms=c("x2"))

# comparison
round(cbind(df,df.mi$imputations[[1]])[1:10,],2)
cbind(
    'df' = with(df,int==x1*x2),
    'df.mi' = with(df.mi$imputations[[1]],int==x1*x2))

还有一些输出(第6行是上述int!=x1*x2的情况之一)

And some of the output (row 6 is one of the cases discussed above for which int!=x1*x2)

      DF           DF (imputed)
      x1 x2   int    x1 x2   int
1   0.59  1  0.59  0.59  1  0.59
2   0.71  1  0.71  0.71  1  0.71
3  -0.11  0  0.00 -0.11  0  0.00
4  -0.45  1 -0.45 -0.45  1 -0.45
5   0.61  1  0.61  0.61  1  0.61
6     NA  1    NA  0.24  1  0.48
7   0.63  0  0.00  0.63  0  0.00
8  -0.28  0  0.00 -0.28  0  0.00
9  -0.28  1 -0.28 -0.28  1 -0.28
10 -0.92  1 -0.92 -0.92  1 -0.92

推荐答案

我认为,无论如何,您都会向Amelia提供int是转换x1 * x2的结果的信息.因此,将其视为简单变量. 但是您可以像这样在估算的数据中执行后转换:

I think , in any cases you give the information to Amelia that int is the result of a transformation , x1*x2. So it treats it as a simple variable. But you can perform a Post-transformation in the imputed data like this:

   df.mi = transform(df.mi, int = x2*x1)

与原始数据相比,您会得到以下结果:

Comparing to the original data you get this result:

mm <- cbind(df,df.mi$imputations$imp1)
mm[mm$x2==0 & is.na(mm$int),]
   x1 x2 int         x1 x2 int
45 NA  0  NA  0.3144084  0   0
49 NA  0  NA -1.1741704  0   0
76 NA  0  NA -0.2018450  0   0

编辑:我认为使用mice软件包可以获得更好的效果:

EDIT I think I get better result using mice package which :

该算法通过以下方式估算不完整的列(目标列) 在给定其他列的情况下生成合理的"综合值 数据."

"The algorithm imputes an incomplete column (the target column) by generating 'plausible' synthetic values given other columns in the data."

使用您的数据,当x2等于0时,我会将原始data.frame与所有估算的数据集进行比较.

Using your data , I compare the original data.frame to all the imputed data sets when x2 is equal to 0.

library(mice)
rr <- mice(df)
mm1 <- cbind(df,do.call(cbind,lapply(1:5,function(i)complete(rr , i))))
mm1[mm1$x2==0 & is.na(mm1$int),]

  x1 x2 int        x1 x2       int        x1 x2        int         x1 x2       int        x1 x2       int        x1 x2        int
20 NA  0  NA 0.5168547  0 -0.162311 0.6203798  0  0.0000000  0.8881394  0 0.0000000 0.9371405  0 0.8248701 0.5855288  0  0.0000000
23 NA  0  NA 0.5168547  0  0.000000 0.4911883  0  0.0000000 -1.8323773  0 0.0000000 0.5855288  0 0.0000000 0.5855288  0  0.0000000
31 NA  0  NA 0.5168547  0  0.000000 0.1495920  0 -0.3240866  2.3305120  0 1.6324456 1.1207127  0 0.8544517 0.5674033  0  0.0000000
60 NA  0  NA 0.5365237  0  0.000000 0.2542712  0  0.0000000  1.5934885  0 0.9371405 0.7094660  0 0.5168547 0.2542712  0 -0.3079534

这篇关于多种归因(Amelia或其他mi程序包)中的交互作用术语的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆