获取组均值差的p值,而无需使用新的参考水平重新拟合线性模型 [英] Get p-value for group mean difference without refitting linear model with a new reference level

查看:120
本文介绍了获取组均值差的p值,而无需使用新的参考水平重新拟合线性模型的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

当我们有一个线性模型,其因子变量为X(级别为ABC)

When we have a linear model with a factor variable X (with levels A, B, and C)

y ~ factor(X) + Var2 + Var3 

结果显示估算值XBXC,它们是差异B - AC - A. (假设引用为A).

The result shows the estimate XB and XC which is differences B - A and C - A. (suppose that the reference is A).

如果我们想知道BC之间的差异的p值:C - B, 我们应该将B或C指定为参考组,然后重新运行模型.

If we want to know the p-value of the difference between B and C: C - B, we should designate B or C as a reference group and re-run the model.

我们可以一次获得效果B - AC - AC - B的p值吗?

Can we get the p-values of the effect B - A, C - A, and C - B at one time?

推荐答案

您正在通过检查回归系数的某些线性组合的p值来寻找线性假设检验.根据我的回答:如何使用聚类协方差矩阵对回归系数进行线性假设检验?,其中我们仅考虑了总和系数,我将扩展功能LinearCombTest以处理更一般的情况,假设alpha作为vars中变量的一些组合系数:

You are looking for linear hypothesis test by check p-value of some linear combination of regression coefficients. Based on my answer: How to conduct linear hypothesis test on regression coefficients with a clustered covariance matrix?, where we only considered sum of coefficients, I will extend the function LinearCombTest to handle more general cases, supposing alpha as some combination coefficients of variables in vars:

LinearCombTest <- function (lmObject, vars, alpha, .vcov = NULL) {
  ## if `.vcov` missing, use the one returned by `lm`
  if (is.null(.vcov)) .vcov <- vcov(lmObject)
  ## estimated coefficients
  beta <- coef(lmObject)
  ## linear combination of `vars` with combination coefficients `alpha`
  LinearComb <- sum(beta[vars] * alpha)
  ## get standard errors for sum of `LinearComb`
  LinearComb_se <- sum(alpha * crossprod(.vcov[vars, vars], alpha)) ^ 0.5
  ## perform t-test on `sumvars`
  tscore <- LinearComb / LinearComb_se
  pvalue <- 2 * pt(abs(tscore), lmObject$df.residual, lower.tail = FALSE)
  ## return a matrix
  form <- paste0("(", paste(alpha, vars, sep = " * "), ")")
  form <- paste0(paste0(form, collapse = " + "), " = 0")
  matrix(c(LinearComb, LinearComb_se, tscore, pvalue), nrow = 1L,
         dimnames = list(form, c("Estimate", "Std. Error", "t value", "Pr(>|t|)")))
  }

考虑一个简单的示例,其中我们为三组ABC进行了平衡设计,组的均值分别为0、1、2.

Consider a simple example, where we have a balanced design for three groups A, B and C, with group mean 0, 1, 2, respectively.

x <- gl(3,100,labels = LETTERS[1:3])
set.seed(0)
y <- c(rnorm(100, 0), rnorm(100, 1), rnorm(100, 2)) + 0.1

fit <- lm(y ~ x)
coef(summary(fit))

#             Estimate Std. Error   t value     Pr(>|t|)
#(Intercept) 0.1226684 0.09692277  1.265631 2.066372e-01
#xB          0.9317800 0.13706949  6.797866 5.823987e-11
#xC          2.0445528 0.13706949 14.916177 6.141008e-38

由于A是参考水平,因此xB提供B - A,而xC提供C - A.假设我们现在对组BC之间的区别感兴趣,即C - B,我们可以使用

Since A is the reference level, xB is giving B - A while xC is giving C - A. Suppose we are now interested in the difference between group B and C, i.e., C - B, we can use

LinearCombTest(fit, c("xC", "xB"), c(1, -1))

#                         Estimate Std. Error  t value     Pr(>|t|)
#(1 * xC) + (-1 * xB) = 0 1.112773  0.1370695 8.118312 1.270686e-14

注意,此函数也很容易计算出BC的组均值,即(Intercept) + xB(Intercept) + xC:

Note, this function is also handy to work out the group mean of B and C, that is (Intercept) + xB and (Intercept) + xC:

LinearCombTest(fit, c("(Intercept)", "xB"), c(1, 1))

#                                 Estimate Std. Error  t value     Pr(>|t|)
#(1 * (Intercept)) + (1 * xB) = 0 1.054448 0.09692277 10.87926 2.007956e-23

LinearCombTest(fit, c("(Intercept)", "xC"), c(1, 1))

#                                 Estimate Std. Error  t value     Pr(>|t|)
#(1 * (Intercept)) + (1 * xC) = 0 2.167221 0.09692277 22.36029 1.272811e-65


使用lsmeans


Alternative solution with lsmeans

再次考虑上述玩具示例:

Consider the above toy example again:

library(lsmeans)
lsmeans(fit, spec = "x", contr = "revpairwise")

#$lsmeans
# x    lsmean         SE  df    lower.CL  upper.CL
# A 0.1226684 0.09692277 297 -0.06807396 0.3134109
# B 1.0544484 0.09692277 297  0.86370603 1.2451909
# C 2.1672213 0.09692277 297  1.97647888 2.3579637
#
#Confidence level used: 0.95 
#
#$contrasts
# contrast estimate        SE  df t.ratio p.value
# B - A    0.931780 0.1370695 297   6.798  <.0001
# C - A    2.044553 0.1370695 297  14.916  <.0001
# C - B    1.112773 0.1370695 297   8.118  <.0001
#
#P value adjustment: tukey method for comparing a family of 3 estimates

$lsmeans域返回边际组均值,而$contrasts返回成对分组均值差,因为我们使用了"revpairwise"对比.阅读 lsmeans 的第32页,以了解"pairwise""revpairwise".

The $lsmeans domain returns the marginal group mean, while $contrasts returns pairwise group mean difference, since we have used "revpairwise" contrast. Read p.32 of lsmeans for difference between "pairwise" and "revpairwise".

这当然很有趣,因为我们可以将其与LinearCombTest的结果进行比较.我们看到LinearCombTest正常运行.

Well this is certainly interesting, as we can compare with the result from LinearCombTest. We see that LinearCombTest is doing correctly.

这篇关于获取组均值差的p值,而无需使用新的参考水平重新拟合线性模型的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆