使用dplyr对几个变量的所有可能组合进行分组 [英] Grouping Over All Possible Combinations of Several Variables With dplyr
问题描述
library(dplyr)
myData < - tbl_df(data。 frame(var1 = rnorm(100),
var2 = letters [1:3]%>%
sample(100,replace = TRUE)%>%
factor(),
var3 = LETTERS [1:3]%>%
sample(100,replace = TRUE)%>%
factor(),
var4 = month.abb [1 :3]%>%
sample(100,replace = TRUE)%>%
factor()))
我想组合myData,最终通过var2,var3和var4的所有可能组合查找汇总数据分组。
我可以使用
创建一个包含所有可变变量组合的列表作为字符值。
groupNames< - names(myData)[2:4]
myGroups< - Map(combn,
list(groupNames),
seq_along(groupNames) ,
simplified = FALSE)%>%
unlist(recursive = FALSE)
我的计划是使用for()循环为每个变量组合创建单独的数据集,如
###这不工作
for(i in 1:length(myGroups)){
assign(myGroups [i]%>%
unlist()%>%
paste0 (collapse =)%>%
paste0(Data),
myData%>%
group_by_(lapply(myGroups [[i]],as.symbol)) %>%
总汇(n =长度(var1),
avgVar2 = var2%>%
mean()))
}
诚然,我对列表不是很好,因为dpyr更新已经改变了分组的工作原理,所以这个问题有点有挑战性。
如果有一个更好的方式来做这个比单独的数据集,我会喜欢知道。
当我只通过一个变量分组时,我已经得到了一个类似于上述工作的循环。
任何和所有的帮助是非常感谢!谢谢!
这似乎很有信心,可能有一种方法可以简化或者花费一个 do
,但它有效。使用您的 myData
和 myGroups
,
results = lapply(myGroups,FUN = function(x){
do.call(what = group_by_,args = c(list(myData),x))%>%
总结(n =长度(var1),
avgVar1 =平均值(var1))
}
)
>结果[[1]]
来源:本地数据框架[3 x 3]
var2 n avgVar1
1 a 31 0.38929738
2 b 31 -0.07451717
3 c 38 -0.22522129
>结果[[4]]
来源:本地数据框[9 x 4]
组:var2
var2 var3 n avgVar1
1 a A 11 -0.1159160
2 a B 11 0.5663312
3 a C 9 0.7904056
4 b A 7 0.0856384
5 b B 13 0.1309756
6 b C 11 -0.4192895
7 c A 15 -0.2783099
8 c B 10 -0.1110877
9 c C 13 -0.2517602
>结果[[7]]
#我不会将它们粘贴到这里,但它有27行,分组为var2,var3和var4。
我将您的总结
调用为平均值 var1
因为 var2
不是数字。
Given a situation such as the following
library(dplyr)
myData <- tbl_df(data.frame( var1 = rnorm(100),
var2 = letters[1:3] %>%
sample(100, replace = TRUE) %>%
factor(),
var3 = LETTERS[1:3] %>%
sample(100, replace = TRUE) %>%
factor(),
var4 = month.abb[1:3] %>%
sample(100, replace = TRUE) %>%
factor()))
I would like to group `myData' to eventually find summary data grouping by all possible combinations of var2, var3, and var4.
I can create a list with all possible combinations of variables as character values with
groupNames <- names(myData)[2:4]
myGroups <- Map(combn,
list(groupNames),
seq_along(groupNames),
simplify = FALSE) %>%
unlist(recursive = FALSE)
My plan was to make separate data sets for each variable combination with a for() loop, something like
### This Does Not Work
for (i in 1:length(myGroups)){
assign( myGroups[i]%>%
unlist() %>%
paste0(collapse = "")%>%
paste0("Data"),
myData %>%
group_by_(lapply(myGroups[[i]], as.symbol)) %>%
summarise( n = length(var1),
avgVar2 = var2 %>%
mean()))
}
Admittedly I am not very good with lists, and looking up this issue was a bit challenging since dpyr updates have altered how grouping works a bit.
If there is a better way to do this than separate data sets I would love to know.
I've gotten a loop similar to above working when I am only grouping by a single variable.
Any and all help is greatly appreciated! Thank you!
This seems convulated, and there's probably a way to simplify or fancy it up with a do
, but it works. Using your myData
and myGroups
,
results = lapply(myGroups, FUN = function(x) {
do.call(what = group_by_, args = c(list(myData), x)) %>%
summarise( n = length(var1),
avgVar1 = mean(var1))
}
)
> results[[1]]
Source: local data frame [3 x 3]
var2 n avgVar1
1 a 31 0.38929738
2 b 31 -0.07451717
3 c 38 -0.22522129
> results[[4]]
Source: local data frame [9 x 4]
Groups: var2
var2 var3 n avgVar1
1 a A 11 -0.1159160
2 a B 11 0.5663312
3 a C 9 0.7904056
4 b A 7 0.0856384
5 b B 13 0.1309756
6 b C 11 -0.4192895
7 c A 15 -0.2783099
8 c B 10 -0.1110877
9 c C 13 -0.2517602
> results[[7]]
# I won't paste them here, but it has all 27 rows, grouped by var2, var3 and var4.
I changed your summarise
call to average var1
since var2
isn't numeric.
这篇关于使用dplyr对几个变量的所有可能组合进行分组的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!