R中的自举相关 [英] Bootstrapped correlation in R

查看:223
本文介绍了R中的自举相关的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试在R中进行自举相关. 我有两个变量Var1和Var2,我想获得Pearson相关性的自举p.value.

I am trying to do a bootstrapped correlation in R. I have two variables Var1 and Var2 and I want to get the bootstrapped p.value of the Pearson correlation.

my variables look like this:
      x            y
1   .6080522    1.707642
2   1.4307273   1.772616
3   0.8226198   1.768537
4   1.7714221   1.265276
5   1.5986213   1.855719
6   1.0000000   1.606106
7   1.1678940   1.671457
8   0.6630012   1.608428
9   1.0842423   1.670619
10  0.5592512   1.107783
11  1.6442616   1.492832
12  0.8326965   1.643923
13  1.1696954   1.763181
14  0.7484543   1.762921
15  1.0842423   1.591566
16  0.9014748   1.718669
17  0.7604917   1.782863
18  0.8566499   1.796216
19  1.4307273   1.913675
20  1.7579695   1.903155

到目前为止,我有这个:

So far I have this:

data = as.data.frame(data)
x = data$Var1
y = data$Var2
dat = data.frame(x,y)

library(boot)
set.seed(1)
bootCorTest3 <- function(data, i){
  d <- data[i, ]
  results  <- cor.test(d$x, d$y, method='pearson')
  c(est = results$estimate, stat = results$statistic, param = results$parameter, p.value = results$p.value, CI = results$conf.int)
}


b3 <- boot(dat, bootCorTest3, R = 1000)
b3

# Original (non-bootstrap) statistics with label
b3$t0
colMeans(b3$t)
boot.ci(b3, type = c("norm", "basic", "perc", "bca")) #bootstrapped CI. 

bootstrapped p值应该是我通过colMeans(b3 $ t)获得的值,对吧?

The bootstrapped p value should be the one I get with colMeans(b3$t), right?

colMeans(b3 $ t)给了我这个:

colMeans(b3$t) gives me this:

est.cor      stat.t    param.df     p.value         CI1         CI2
 0.28495324  2.13981008 48.00000000  0.14418623  0.01438146  0.51726022

似乎一切正常.问题是我在不同的软件上运行了相同的统计信息,结果却大不相同.我在这里得到的p值比另一个高. 我认为我可能在这里做错了,因为我不擅长R.

It seems like everything is working fine. The problem is that I ran the same statistics on a different software and the results are widely different. The p-value I get here is way higher than on the other. I think that I may have done something wrong here as I am not strong in R.

有人可以给我一些有关此代码的反馈吗?难道我做错了什么?您如何获得皮尔逊相关系数的自举p.值?

Can anyone give me some feedback on this code? Am I doing something wrong? Ho would you get the bootstrapped p.value for the Pearson Correlation?

谢谢您的时间.

推荐答案

如果要自举相关性测试,则只需从自举统计函数返回相关系数即可.在这种情况下,引导自相关测试的p值不合适,因为您忽略了相关测试的方向性.

If you want to bootstrap your correlation test, you only need to return the correlation coefficient from your bootstrap statistic function. Bootstrapping the p-value of the correlation test is not appropriate in this case, as you ignore the directionality of the correlation test.

在CrossValidated上检查此问题,以获取有关执行引导假设检验的一些不错的答案:

Check this question on CrossValidated for some nice answers on performing bootstrap hypothesis tests: https://stats.stackexchange.com/questions/20701/computing-p-value-using-bootstrap-with-r

library("boot")
data <- read.csv("~/Documents/stack/tmp.csv", header = FALSE)
colnames(data) <- c("x", "y")

data <- as.data.frame(data)
x <- data$Var1
y <- data$Var2
dat <- data.frame(x,y)

set.seed(1)

b3 <- boot(data, 
  statistic = function(data, i) {
    cor(data[i, "x"], data[i, "y"], method='pearson')
  },
  R = 1000
)
b3
#> 
#> ORDINARY NONPARAMETRIC BOOTSTRAP
#> 
#> 
#> Call:
#> boot(data = data, statistic = function(data, i) {
#>     cor(data[i, "x"], data[i, "y"], method = "pearson")
#> }, R = 1000)
#> 
#> 
#> Bootstrap Statistics :
#>      original        bias    std. error
#> t1* 0.1279691 -0.0004316781    0.314056
boot.ci(b3, type = c("norm", "basic", "perc", "bca")) #bootstrapped CI. 
#> BOOTSTRAP CONFIDENCE INTERVAL CALCULATIONS
#> Based on 1000 bootstrap replicates
#> 
#> CALL : 
#> boot.ci(boot.out = b3, type = c("norm", "basic", "perc", "bca"))
#> 
#> Intervals : 
#> Level      Normal              Basic         
#> 95%   (-0.4871,  0.7439 )   (-0.4216,  0.7784 )  
#> 
#> Level     Percentile            BCa          
#> 95%   (-0.5225,  0.6775 )   (-0.5559,  0.6484 )  
#> Calculations and Intervals on Original Scale

plot(density(b3$t))
abline(v = 0, lty = "dashed", col = "grey60")

在这种情况下,没有p值,可以肯定地说,采样分布的大部分质量都非常接近于零.

In this case without a p-value it's quite safe to say that most of the mass of the sampling distribution is very close to zero.

这篇关于R中的自举相关的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆