Kruskal-Wallis测试:为子集data.frame创建lapply函数吗? [英] Kruskal-Wallis test: create lapply function to subset data.frame?

查看:67
本文介绍了Kruskal-Wallis测试:为子集data.frame创建lapply函数吗?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一组值(val)的数据集,这些值按多个类别(distance& phase)分组.我想通过Kruskal-Wallis test测试每个类别,其中val是因变量,distance是一个因数,并且phase将我的数据分为3组.

I have a data set of values (val) grouped by multiple categories (distance & phase). I would like to test each category by Kruskal-Wallis test, where val is dependent variable, distance is a factor, and phase split my data in 3 groups.

因此,我需要在Kruskal-Wallis测试中指定数据的子集,然后将测试应用于每个组.但是,我无法使用我的子设置!

As such, I need to specify the subset of the data within Kruskal-Wallis test and then apply the test to each of groups. BUT, I can not get my subsetting to work!

在R帮助中,指定subsetan optional vector specifying a subset of observations to be used.,但是如何正确地将其放入我的lapply函数?

In R help, it is specified that the subset is an optional vector specifying a subset of observations to be used. But how to correctly put this to my lapply function?

我的虚拟数据:

# create data
val<-runif(60, min = 0, max = 100)
distance<-floor(runif(60, min=1, max=3))
phase<-rep(c("a", "b", "c"), 20)

df<-data.frame(val, distance, phase)

# get unique groups
ii<-unique(df$phase)

# get basic statistics per group
aggregate(val ~ distance + phase, df, mean)

# run Kruskal test, specify the subset
kruskal.test(df$val ~df$distance,
             subset = phase == "c")

这很好用,所以我的子集应该正确设置为向量. 但是如何在lapply函数中使用它?

This works well, so my subset should be correctly set as a vector. But how to use this in a lapply function?

# DOES not work!!
lapply(ii, kruskal.test(df$val ~ df$distance,
                        subset = df$phase == as.character(ii))) 

我的总体目标是从kruskal.test创建一个函数,并将每个组的所有统计信息保存到一个表中.

My overall goal is to create a function from kruskal.test, and save all statistics for each group into one table.

我们非常感谢所有帮助.

All help is highly appreciated.

推荐答案

通常,您先从split ting开始,然后lapply ing.

Usually you would start by splitting, and then lapplying.

类似

lapply(split(df, df$phase), function(d) { kruskal.test(val ~ distance, data=d) })

将产生一个按阶段索引的kruskal.test结果的列表.

would yield a list, indexed by the phase, of the results of kruskal.test.

您的最终表达式不起作用,因为lapply需要一个函数,而应用kruskal.test不会得到函数,它将导致运行该测试的结果.如果您将它与带有索引的函数定义一起使用,则它将起作用,只是减少一点惯用语.

Your final expression does not work because lapply expects a function, and applying kruskal.test does not result in a function, it results in the result of running that test. If you surround it with a function definition with the index, then it would work, just be a little less idiomatic.

lapply(ii, function(i) { kruskal.test(df$val ~ df$distance, subset=df$phase==i )})

这篇关于Kruskal-Wallis测试:为子集data.frame创建lapply函数吗?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆