组合多重插补数据的自由度、t 统计量和 f 值 [英] degrees of freedom, t-statistic, and f-values of combined multiply imputed data

查看:64
本文介绍了组合多重插补数据的自由度、t 统计量和 f 值的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我是 R 新手用户.我安装了 Zelig 4.1-3 版和 Amelia II 1.7 版.我对如何使用 R 包和函数获得组合乘法插补数据的自由度、t 统计量和 f 值感到困惑.

I am a novice R user. I installed Zelig version 4.1-3 and Amelia II version 1.7. I am puzzled on how I can obtain the degrees of freedom, t-statistic, and f-values of combined multiply imputed data using R packages and functions.

首先,我加载了 Amelia 和 Zelig:

First, I loaded Amelia and Zelig:

require(Amelia)
require(Zelig)

然后,我加载了 Amelia 附带的示例数据:

Then, I loaded the sample data that came with Amelia:

data(freetrade)

我使用 amelia 函数为此数据集创建了 5 个插补.

I created 5 imputations for this dataset using the amelia function.

a.out <- amelia(freetrade, m = 5, ts = "year", cs = "country")  

然后,为了合并插补,我使用了 zelig 函数:

Then, to combine imputations, I used the zelig function:

z.out.imp <- zelig(tariff ~ polity + pop + gdp.pc + year + country, 
     data = a.out$imputations, model = "ls" )

但是,当我使用此代码时,我得到的系数似乎是单个插补的系数,而不是组合集的系数:

However, I got coefficients that appeared to be coefficients of individual imputations, and not those of the combined set when I used this code:

summary(z.out.imp)

它们如下:

Coefficients:
                           Value   Std. Error     t-stat      p-value
(Intercept)         2.766176e+03 6.670110e+02  4.1471215 0.0003572868
polity              1.645011e-01 3.078134e-01  0.5344183 0.5938286336
pop                -6.079963e-08 6.518429e-08 -0.9327345 0.3774275934
gdp.pc             -4.246794e-04 1.945866e-03 -0.2182470 0.8319093062
year               -1.335563e+00 3.519513e-01 -3.7947390 0.0009787456
countryIndonesia   -7.000319e+01 4.646330e+01 -1.5066343 0.1700377061
countryKorea       -8.643855e+01 4.671629e+01 -1.8502870 0.0926657863
countryMalaysia    -8.815182e+01 5.389486e+01 -1.6356256 0.1393312364
countryNepal       -8.215250e+01 5.475828e+01 -1.5002753 0.1702129176
countryPakistan    -4.349869e+01 5.149729e+01 -0.8446791 0.4238033944
countryPhilippines -8.088975e+01 5.320694e+01 -1.5202857 0.1673234716
countrySriLanka    -7.668840e+01 5.695485e+01 -1.3464771 0.2161986616
countryThailand    -7.400481e+01 5.186395e+01 -1.4269026 0.1903428838

Amelia 手册显示了组合乘法估算数据集的一些系数应该是什么,尽管没有解释如何使用 R 获得所有系数(请参阅 http://cran.r-project.org/web/packages/Amelia/vignettes/amelia.pdf)

The Amelia manual shows what some of the coefficients for the combined multiply imputed dataset should be although there is no explanation on how to obtain all of them using R (see page 46 of http://cran.r-project.org/web/packages/Amelia/vignettes/amelia.pdf)

Complete DF = 167
DF:   min   = 10.36
      avg   = 18.81
      max   = 37.62
F( 2, 10.4) = 15.50
Prob > F    = 0.0008


                    Value        Std. Error     t-stat       p-value
polity             -0.206        0.39            -0.53       0.61 
pop                -3.21 e-08    8.72e-09         3.68       0.004
gdp.pc             -0.0027       0.000644        -4.28       0.000
Intercept           32.7         2.66            12.29       0.000

由于 amelia 函数使用蒙特卡罗模拟,我们可以预期运行之间的差异很小.然而,截距的巨大差异是一个线索,即 zelig 函数返回的回归统计数据不是组合集.

Because the amelia function uses monte carlo simulations, we can expect small differences between runs. However, the huge difference in the intercept was a clue that the zelig function returned regression statistics for something else than the combined set.

Amelia 手册提供了以下代码:

The Amelia manual provides this code:

> b.out <-NULL
> se.out <-NULL
> for(i in 1:a.out$m){
+ ols.out <- lm(tariff ~ polity + pop + gdp.pc, data = a.out$imputations[[i]])
+ b.out <- rbind(b.out, ols.out$coef)
+ se.out <-rbind(se.out, coef(summary(ols.out))[,2])
+ }
> combined.results<-mi.meld(q=b.out, se = se.out)
> combined.results

我尝试使用它.返回的结果非常接近第 46 页上显示的值和标准错误:

I tried using it. The returned results are very close to the values and standard errors shown on page 46:

$q.mi
     (Intercept)     polity          pop       gdp.pc
[1,]    33.17325 -0.1499587 2.967196e-08 -0.002724229

$se.mi
     (Intercept)   polity          pop       gdp.pc
[1,]    2.116721 0.276968 6.061993e-09 0.0006596203

但是,它们不包括 t 统计量、自由度或 f 值.

However, they do not include the t-statistic, degrees of freedom, or f-values.

R 中是否有可用的开源包或函数,以便我无需手动计算即可获得自由度、t 统计量和 f 值?

Are there open-source packages or functions available in R so that I can obtain the degrees of freedom, t-statistic, and f-values without having to do manual calculations?

谢谢.

推荐答案

这是我试图重现您的问题的注释和编辑记录:

Here is an annotated and edited transcript of my attempt to reproduce your problem:

> library(Zelig)

ZELIG (Versions 4.1-3, built: 2013-01-30)

> a.out <- amelia(freetrade, idvars="country", m = 5)
Error: could not find function "amelia"

我遇到的第一个问题是你没有提到我们需要加载 Amelia 包.更正后,我再次尝试运行第一行:

The first problem I had is that you did not mention that we need to load the Amelia package. After correcting that I tried again to run the first line:

> library(Amelia)
## (Version 1.7, built: 2013-02-10)
> a.out <- amelia(freetrade, idvars="country", m = 5)
Error in amelia(freetrade, idvars = "country", m = 5) : 
  object 'freetrade' not found

这失败了,因为您没有说明如何获取自由贸易数据.在这里猜测:

This fails because you did not say how to get the freetrade data. Guessing here:

> data(freetrade)
> a.out <- amelia(freetrade, m = 5)
Amelia Error Code:  38 
 The following variable(s) are characters: 
     country
You may have wanted to set this as a ID variable to remove it
from the imputation model or as an ordinal or nominal
variable to be imputed.  Please set it as either and
try again. 

您的示例不起作用,因为需要告诉 amelia 函数如何处理字符变量.所以我修改了你的例子以获得一个可以实际运行的例子:

Your example does not work because the amelia function needs to be told what to do with character variables. So I modified your example in order to get one that would actually run:

> a.out <- amelia(freetrade, idvars="country", m = 5)
> z.out.imp <- zelig(tariff ~ polity + pop + gdp.pc + year + country, 
+      data = a.out$imputations, model = "ls")

对结果运行 summary 给出了组合模型统计数据"

Running summary on the result gives me the combined model statistics"

# This part works just fine.
> summary(z.out.imp)

  Model: ls
  Number of multiply imputed data sets: 5 

Combined results:

Call:
lm(formula = formula, weights = weights, model = F, data = data)

Coefficients:
                           Value   Std. Error     t-stat      p-value
(Intercept)         3.294715e+03 6.425487e+02  5.1275725 1.330807e-05
polity              2.761343e-01 3.354271e-01  0.8232319 4.145813e-01
pop                -6.443769e-08 5.526885e-08 -1.1658953 2.659143e-01
gdp.pc              4.549885e-04 1.354139e-03  0.3359984 7.382138e-01
year               -1.599422e+00 3.306932e-01 -4.8365739 2.649602e-05
countryIndonesia   -7.396526e+01 4.112206e+01 -1.7986760 1.009329e-01
countryKorea       -9.673542e+01 5.036909e+01 -1.9205317 8.713903e-02
countryMalaysia    -9.271187e+01 4.998836e+01 -1.8546690 9.424041e-02
countryNepal       -8.863525e+01 4.920061e+01 -1.8015072 9.990792e-02
countryPakistan    -4.789370e+01 4.362907e+01 -1.0977475 2.960914e-01
countryPhilippines -8.548672e+01 4.662372e+01 -1.8335456 9.533829e-02
countrySriLanka    -8.446560e+01 4.939918e+01 -1.7098586 1.170170e-01
countryThailand    -8.026702e+01 4.741244e+01 -1.6929529 1.213329e-01

For combined results from datasets i to j, use summary(x, subset = i:j).
For separate results, use print(summary(x), subset = i:j).

简而言之,您的示例中唯一对我有用的是您声称对您不起作用的一件事.请发布代码和输出,准确显示您做了什么以及发生了什么,因为目前我没有足够的信息来帮助解决问题.

In short, the only thing in your example that works for me is the one thing you claim does not work for you. Please post the code and output showing exactly what you did and exactly what happened, because at the moment I don't have enough information to help solve the problem.

这篇关于组合多重插补数据的自由度、t 统计量和 f 值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆