R:Kruskal-Wallis测试在数据帧中的指定列上循环 [英] R: Kruskal-Wallis test in loop over specified columns in data frame

查看:123
本文介绍了R:Kruskal-Wallis测试在数据帧中的指定列上循环的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想使用一个分组变量对数据框中的某些数字变量进行KW测试.我宁愿循环执行此操作,而不是键入所有测试,因为它们是许多变量(比下面的示例更多).

I would like to run a KW-test over certain numerical variables from a data frame, using one grouping variable. I'd prefer to do this in a loop, instead of typing out all the tests, as they are many variables (more than in the example below).

模拟数据:

library(dplyr)
set.seed(123)
Data <- tbl_df(
data.frame(
muttype = as.factor(rep(c("missense", "frameshift", "nonsense"), each = 80)),
ados.tsc   = runif(240, 0, 10),
ados.sa    = runif(240, 0, 10),
ados.rrb   = runif(240, 0, 10))
) %>%
group_by(muttype)
ados.sim <- as.data.frame(Data)

以下代码在循环外部运行正常.

kruskal.test(formula (paste((colnames(ados.sim)[2]), "~ muttype")), data = 
ados.sim)

但它不在循环内:

for(i in names(ados.sim[,2:4])){  
ados.mtp <- kruskal.test(formula (paste((colnames(ados.sim)[i]), "~ muttype")), 
data = ados.sim)
}

我收到错误消息:

terms.formula(formula,data = data)中的错误: 模型公式中的无效项

Error in terms.formula(formula, data = data) : invalid term in model formula

知道如何解决这个问题的人吗? 非常感谢!

Anybody who knows how to solve this? Much appreciated!!

推荐答案

尝试:

results <- list()
for(i in names(ados.sim[,2:4])){  
  results[[i]] <- kruskal.test(formula(paste(i, "~ muttype")), data = ados.sim)
}

这还将您的结果保存在列表中,并避免在每次迭代中将结果覆盖为ados.mtp,我认为这不是您打算执行的操作.

This also saves your results in a list and avoids overwriting your results as ados.mtp in every iteration, which I think is not what you intended to do.

请注意以下几点:

for(i in names(ados.sim[,2:4])){  
   print(i)
}
[1] "ados.tsc"
[1] "ados.sa"
[1] "ados.rrb"

也就是说,i已经为您提供了该列的名称.代码中的问题是,您试图像整数一样使用它进行子设置,这将结果转换为NA.

That is, i already gives you the name of the column. The problem in your code was that you tried to use it like an integer for subsetting, which turned the outcome into NA.

for(i in names(ados.sim[,2:4])){  
   print(paste((colnames(ados.sim)[i]), "~ muttype"))
}
[1] "NA ~ muttype"
[1] "NA ~ muttype"
[1] "NA ~ muttype"

仅供参考,所有这些操作还可以通过以下两种我经常喜欢的方式来完成,因为这会使后续分析变得更加容易:

And just for reference, all of this could also be done in the following two ways that I often prefer since it makes subsequent analysis slightly easier:

首先,将所有测试对象存储在一个数据框中:

First, store all test objects in a dataframe:

library(tidyr)
df <- ados.sim %>% gather(key, value, -muttype) %>% 
      group_by(key) %>% 
      do(test = kruskal.test(x= .$value, g = .$muttype))

然后您可以对数据框进行子集化以获得测试结果:

You can then subset the dataframe to get the test outcomes:

df[df$key == "ados.rrb",]$test
[[1]]

    Kruskal-Wallis rank sum test

data:  .$value and .$muttype
Kruskal-Wallis chi-squared = 2.2205, df = 2, p-value = 0.3295

或者,直接在数据框中获得所有结果,而无需存储测试对象:

Alternatively, get all results directly in a dataframe, without storing the test objects:

library(broom)
df2 <- ados.sim %>% gather(key, value, -muttype) %>% 
       group_by(key) %>% 
       do(tidy(kruskal.test(x= .$value, g = .$muttype)))
df2
# A tibble: 3 x 5
# Groups:   key [3]
       key statistic   p.value parameter                       method
     <chr>     <dbl>     <dbl>     <int>                       <fctr>
1 ados.rrb 2.2205031 0.3294761         2 Kruskal-Wallis rank sum test
2  ados.sa 0.1319554 0.9361517         2 Kruskal-Wallis rank sum test
3 ados.tsc 0.3618102 0.8345146         2 Kruskal-Wallis rank sum test

这篇关于R:Kruskal-Wallis测试在数据帧中的指定列上循环的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆