在 R 中使用 Kolmogorov Smirnov 检验 [英] Using Kolmogorov Smirnov Test in R

查看:38
本文介绍了在 R 中使用 Kolmogorov Smirnov 检验的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我设计了 3000 个实验,所以在一个实验中有 4 个组(治疗),每组中有 50 个个体(受试者).对于每个实验,我都做了一个标准的单向方差分析,并证明它们的 p.values 在原假设下是否具有 uni 概率函数,但是 ks.test 拒绝了这个假设,我不明白为什么?

I designed 3000 experiments, so that in one experiment there are 4 groups (treatment), in each group there are 50 individuals (subjects). For each experiment I do a standard one way ANOVA and proof if their p.values has a uni probability function under the null-hypothesis, but ks.test rejects this assumption and I cant see why?

subject<-50
treatment<-4
experiment<-list()
R<-3000
seed<-split(1:(R*subject),1:R)
for(i in 1:R){
  e<-c()
  for(j in 1:subject){
    set.seed(seed[[i]][j]) 
    e<-c(e,rmvnorm(mean=rep(0,treatment),sigma=diag(3,4),n=1,method="chol"))
   }
  experiment<-c(experiment,list(matrix(e,subject,treatment,byrow=T)))
 }

 p.values<-c()
for(e in experiment){
  d<-data.frame(response=c(e),treatment=factor(rep(1:treatment,each=subject)))
  p.values<-c(p.values,anova(lm(response~treatment,d))[1,"Pr(>F)"])
 }

 ks.test(p.values, punif,alternative = "two.sided")

推荐答案

我注释掉了您代码中更改随机种子的行,并得到了 0.34 的 P 值.那是一个未知的种子,所以为了可重复性,我做了 set.seed(1) 并再次运行它.这一次,我得到了 0.98 的 P 值.

I commented out the lines in your code that change the random seed, and got a P-value of .34. That was with an unknown seed, so for reproducibility, I did set.seed(1) and ran it again. This time, I got a P-value of 0.98.

至于为什么这会有所不同,我不是 PRNG 方面的专家,但任何像样的生成器都将确保连续抽签在所有实际用途中在统计上都是独立的.对于更大的滞后,最好的将确保相同,例如,作为 R 的默认 PRNG 的 Mersenne Twister 保证它的滞后高达 623 (IIRC).事实上,干预种子很可能会损害平局的统计特性.

As to why this makes a difference, I'm not an expert in PRNGs, but any decent generator will ensure successive draws are statistically independent for all practical purposes. The best ones will ensure the same for greater lags, eg the Mersenne Twister which is R's default PRNG guarantees it for lags up to 623 (IIRC). In fact, meddling with the seed is likely to impair the statistical properties of the draws.

你的代码也在以一种非常低效的方式做事.您正在为实验创建一个列表,并为每个实验添加一个项目.每个实验中,您还创建一个矩阵,并为每个观察添加一行.然后你对 P 值做一些非常相似的事情.我看看能不能解决这个问题.

Your code is also doing things in a really inefficient way. You're creating a list for the experiments, and adding one item for each experiment. Within each experiment, you also create a matrix, and add a row for each observation. Then you do something very similar for the P-values. I'll see if I can fix that up.

这就是我替换您的代码的方式.严格来说,我可以通过避免公式、创建裸模型矩阵并直接调用 lm.fit 来使其更紧密.但这意味着必须手动编写 ANOVA 测试,而不是简单地调用 anova,这会带来更多的麻烦.

This is how I'd replace your code. Strictly speaking I could make it even tighter, by avoiding formulas, creating the bare model matrix, and calling lm.fit directly. But that would mean having to manually code up the ANOVA test rather than simply calling anova, which is more trouble than it's worth.

set.seed(1) # or any other number you like

x <- factor(rep(seq_len(treatment), each=subject))
p.values <- sapply(seq_len(R), function(r) {
    y <- rnorm(subject * treatment, s=3)
    anova(lm(y ~ x))[1,"Pr(>F)"]
})
ks.test(p.values, punif,alternative = "two.sided")


        One-sample Kolmogorov-Smirnov test

data:  p.values
D = 0.0121, p-value = 0.772
alternative hypothesis: two-sided

这篇关于在 R 中使用 Kolmogorov Smirnov 检验的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆