在R中使用Kolmogorov Smirnov检验 [英] Using Kolmogorov Smirnov Test in R

查看:135
本文介绍了在R中使用Kolmogorov Smirnov检验的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我设计了3000个实验,因此在一个实验中有4个组(治疗),每组有50个个体(受试者).对于每个实验,我都进行了标准的单向方差分析,并证明了在原假设下它们的p.值是否具有单概率函数,但是ks.test拒绝了这一假设,我不明白为什么?

I designed 3000 experiments, so that in one experiment there are 4 groups (treatment), in each group there are 50 individuals (subjects). For each experiment I do a standard one way ANOVA and proof if their p.values has a uni probability function under the null-hypothesis, but ks.test rejects this assumption and I cant see why?

subject<-50
treatment<-4
experiment<-list()
R<-3000
seed<-split(1:(R*subject),1:R)
for(i in 1:R){
  e<-c()
  for(j in 1:subject){
    set.seed(seed[[i]][j]) 
    e<-c(e,rmvnorm(mean=rep(0,treatment),sigma=diag(3,4),n=1,method="chol"))
   }
  experiment<-c(experiment,list(matrix(e,subject,treatment,byrow=T)))
 }

 p.values<-c()
for(e in experiment){
  d<-data.frame(response=c(e),treatment=factor(rep(1:treatment,each=subject)))
  p.values<-c(p.values,anova(lm(response~treatment,d))[1,"Pr(>F)"])
 }

 ks.test(p.values, punif,alternative = "two.sided")

推荐答案

我注释掉了代码中更改随机种子的行,并获得了0.34的P值.那是一个未知的种子,因此为了重现性,我做了set.seed(1)然后再次运行它.这次,我的P值为0.98.

I commented out the lines in your code that change the random seed, and got a P-value of .34. That was with an unknown seed, so for reproducibility, I did set.seed(1) and ran it again. This time, I got a P-value of 0.98.

关于为什么会有所不同,我不是PRNG的专家,但是任何体面的生成器都将确保连续抽奖在统计学上独立于所有实际目的.最好的将确保更大的延迟,例如,M的Mersenne Twister是R的默认PRNG,可以保证高达623(IIRC)的延迟.实际上,干预种子可能会损害抽奖的统计属性.

As to why this makes a difference, I'm not an expert in PRNGs, but any decent generator will ensure successive draws are statistically independent for all practical purposes. The best ones will ensure the same for greater lags, eg the Mersenne Twister which is R's default PRNG guarantees it for lags up to 623 (IIRC). In fact, meddling with the seed is likely to impair the statistical properties of the draws.

您的代码也以一种非常低效的方式执行操作.您正在为实验创建列表,并为每个实验添加一项.在每个实验中 中,您还创建了一个矩阵,并为每个观察值添加了一行.然后,对P值执行非常类似的操作.我会看看是否可以解决该问题.

Your code is also doing things in a really inefficient way. You're creating a list for the experiments, and adding one item for each experiment. Within each experiment, you also create a matrix, and add a row for each observation. Then you do something very similar for the P-values. I'll see if I can fix that up.

这就是我要替换您的代码的方式.严格来说,通过避免公式,创建裸模型矩阵并直接调用lm.fit,我可以使其更加严格.但这意味着必须手动编写方差分析测试,而不是简单地调用anova,这比其应有的麻烦还要多.

This is how I'd replace your code. Strictly speaking I could make it even tighter, by avoiding formulas, creating the bare model matrix, and calling lm.fit directly. But that would mean having to manually code up the ANOVA test rather than simply calling anova, which is more trouble than it's worth.

set.seed(1) # or any other number you like

x <- factor(rep(seq_len(treatment), each=subject))
p.values <- sapply(seq_len(R), function(r) {
    y <- rnorm(subject * treatment, s=3)
    anova(lm(y ~ x))[1,"Pr(>F)"]
})
ks.test(p.values, punif,alternative = "two.sided")


        One-sample Kolmogorov-Smirnov test

data:  p.values
D = 0.0121, p-value = 0.772
alternative hypothesis: two-sided

这篇关于在R中使用Kolmogorov Smirnov检验的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆