R上相关系数的自举p值 [英] Bootstrapped p-value for a correlation coefficient on R
问题描述
在R
上,我使用了boostrap方法来获得相关系数估计和置信区间.
我认为,要获得p值,我可以计算不包含零的置信区间的比例.但这不是解决方案.
On R
, I used the boostrap method to get a correlation coefficient estimation and the confidence intervals.
To get the p-value, I thought, I can calculate the proportion of the confidence intervals which do not contain zero. But this is not the solution.
在这种情况下如何获得p值?
How can I get the p-value in this case ?
我正在使用cor.test
来获取系数估计. cor.test
可能还会为我提供每次测试的p值.但是如何获得自举的p值?
I am using cor.test
to get the coefficient estimation. cor.test
may also gives me the p-value from every test. But how can I get the bootstrapped p-value ?
非常感谢您!
下面是一个示例:
n=30
data = matrix (data = c (rnorm (n), rnorm (n),rnorm (n), rpois(n,1),
rbinom(n,1,0.6)), nrow = n, byrow = F)
data= as.data.frame(data)
z1 = replicate( Brep, sample(1:dim(data)[1], dim(data)[1], replace = T))
res = do.call ( rbind, apply(z1, 2, function(x){ res=cor.test(data$V1[x], data$V2[x]) ; return ((list(res$p.value,res$estimate))) }))
coeffcorr = mean(unlist(res[,2]), na.rm = T) #bootstrapped coefficient
confInter1 = quantile(unlist(res[,2]), c(0.025, 0.975), na.rm = T)[1] #confidence interval 1
confInter2 = quantile(unlist(res[,2]), c(0.025, 0.975), na.rm = T)[2] #confidence interval 2
p.value = mean (unlist(res[,1]), na.rm = T ) # pvalue
推荐答案
R中自举的标准方法是使用基本包boot
.首先定义引导程序函数,该函数需要两个参数,即数据集和数据集中的索引.这是下面的功能bootCorTest
.在函数中,您可以对数据集进行子集选择,仅选择索引定义的行.
The standard way of bootstrapping in R is to use base package boot
. You start by defining the bootstrap function, a function that takes two arguments, the dataset and an index into the dataset. This is function bootCorTest
below. In the functionyou subset the dataset selecting just the rows defined by the index.
其余的很简单.
library(boot)
bootCorTest <- function(data, i){
d <- data[i, ]
cor.test(d$x, d$y)$p.value
}
# First dataset in help("cor.test")
x <- c(44.4, 45.9, 41.9, 53.3, 44.7, 44.1, 50.7, 45.2, 60.1)
y <- c( 2.6, 3.1, 2.5, 5.0, 3.6, 4.0, 5.2, 2.8, 3.8)
dat <- data.frame(x, y)
b <- boot(dat, bootCorTest, R = 1000)
b$t0
#[1] 0.10817
mean(b$t)
#[1] 0.134634
boot.ci(b)
有关功能boot
和boot.ci
的结果的更多信息,请参见它们各自的帮助页面.
For more information on the results of functions boot
and boot.ci
see their respective help pages.
编辑.
如果要从启动统计功能bootCorTest
返回多个值,则应返回一个向量.在以下情况下,它将返回具有所需值的命名向量.
If you want to return several values from the boot statistic function bootCorTest
, you should return a vector. In the following case it returns a named vector with the values required.
请注意,我设置了RNG种子,以使结果可重复.我应该已经在上面完成了.
Note that I set the RNG seed, to make the results reproducible. I should already have done it above.
set.seed(7612) # Make the results reproducible
bootCorTest2 <- function(data, i){
d <- data[i, ]
res <- cor.test(d$x, d$y)
c(stat = res$statistic, p.value = res$p.value)
}
b2 <- boot(dat, bootCorTest, R = 1000)
b2$t0
# stat.t p.value
#1.841083 0.108173
colMeans(b2$t)
#[1] 2.869479 0.133857
这篇关于R上相关系数的自举p值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!