R上相关系数的自举p值 [英] Bootstrapped p-value for a correlation coefficient on R

查看:264
本文介绍了R上相关系数的自举p值的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

R上,我使用了boostrap方法来获得相关系数估计和置信区间. 我认为,要获得p值,我可以计算不包含零的置信区间的比例.但这不是解决方案.

On R, I used the boostrap method to get a correlation coefficient estimation and the confidence intervals. To get the p-value, I thought, I can calculate the proportion of the confidence intervals which do not contain zero. But this is not the solution.

在这种情况下如何获得p值?

How can I get the p-value in this case ?

我正在使用cor.test来获取系数估计. cor.test可能还会为我提供每次测试的p值.但是如何获得自举的p值?

I am using cor.test to get the coefficient estimation. cor.test may also gives me the p-value from every test. But how can I get the bootstrapped p-value ?

非常感谢您!

下面是一个示例:

n=30
data = matrix (data = c (rnorm (n), rnorm (n),rnorm (n), rpois(n,1), 
rbinom(n,1,0.6)), nrow =  n, byrow = F)
data= as.data.frame(data)
z1  = replicate( Brep, sample(1:dim(data)[1], dim(data)[1], replace = T))
res = do.call  ( rbind, apply(z1, 2, function(x){ res=cor.test(data$V1[x], data$V2[x]) ; return ((list(res$p.value,res$estimate))) }))

 coeffcorr  = mean(unlist(res[,2]), na.rm = T) #bootstrapped coefficient
 confInter1 = quantile(unlist(res[,2]), c(0.025, 0.975), na.rm = T)[1] #confidence interval 1
 confInter2 = quantile(unlist(res[,2]), c(0.025, 0.975), na.rm = T)[2] #confidence interval 2  
 p.value    =  mean    (unlist(res[,1]), na.rm = T )  # pvalue

推荐答案

R中自举的标准方法是使用基本包boot.首先定义引导程序函数,该函数需要两个参数,即数据集和数据集中的索引.这是下面的功能bootCorTest.在函数中,您可以对数据集进行子集选择,仅选择索引定义的行.

The standard way of bootstrapping in R is to use base package boot. You start by defining the bootstrap function, a function that takes two arguments, the dataset and an index into the dataset. This is function bootCorTest below. In the functionyou subset the dataset selecting just the rows defined by the index.

其余的很简单.

library(boot)

bootCorTest <- function(data, i){
    d <- data[i, ]
    cor.test(d$x, d$y)$p.value
}


# First dataset in help("cor.test")
x <- c(44.4, 45.9, 41.9, 53.3, 44.7, 44.1, 50.7, 45.2, 60.1)
y <- c( 2.6,  3.1,  2.5,  5.0,  3.6,  4.0,  5.2,  2.8,  3.8)
dat <- data.frame(x, y)

b <- boot(dat, bootCorTest, R = 1000)

b$t0
#[1] 0.10817

mean(b$t)
#[1] 0.134634

boot.ci(b)

有关功能bootboot.ci的结果的更多信息,请参见它们各自的帮助页面.

For more information on the results of functions boot and boot.ci see their respective help pages.

编辑.

如果要从启动统计功能bootCorTest返回多个值,则应返回一个向量.在以下情况下,它将返回具有所需值的命名向量.

If you want to return several values from the boot statistic function bootCorTest, you should return a vector. In the following case it returns a named vector with the values required.

请注意,我设置了RNG种子,以使结果可重复.我应该已经在上面完成了.

Note that I set the RNG seed, to make the results reproducible. I should already have done it above.

set.seed(7612)    # Make the results reproducible

bootCorTest2 <- function(data, i){
    d <- data[i, ]
    res <- cor.test(d$x, d$y)
    c(stat = res$statistic, p.value = res$p.value)
}


b2 <- boot(dat, bootCorTest, R = 1000)

b2$t0
#  stat.t  p.value 
#1.841083 0.108173


colMeans(b2$t)
#[1] 2.869479 0.133857

这篇关于R上相关系数的自举p值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆