使用dplyr对几个变量的所有可能组合进行分组 [英] Grouping Over All Possible Combinations of Several Variables With dplyr

查看:136
本文介绍了使用dplyr对几个变量的所有可能组合进行分组的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

考虑到以下情况:

  library(dplyr)
myData < - tbl_df(data。 frame(var1 = rnorm(100),
var2 = letters [1:3]%>%
sample(100,replace = TRUE)%>%
factor(),
var3 = LETTERS [1:3]%>%
sample(100,replace = TRUE)%>%
factor(),
var4 = month.abb [1 :3]%>%
sample(100,replace = TRUE)%>%
factor()))

我想组合myData,最终通过var2,var3和var4的所有可能组合查找汇总数据分组。



我可以使用



创建一个包含所有可变变量组合的列表作为字符值。

  groupNames<  -  names(myData)[2:4] 

myGroups< - Map(combn,
list(groupNames),
seq_along(groupNames) ,
simplified = FALSE)%>%
unlist(recursive = FALSE)

我的计划是使用for()循环为每个变量组合创建单独的数据集,如

  ###这不工作
for(i in 1:length(myGroups)){
assign(myGroups [i]%>%
unlist()%>%
paste0 (collapse =)%>%
paste0(Data),
myData%>%
group_by_(lapply(myGroups [[i]],as.symbol)) %>%
总汇(n =长度(var1),
avgVar2 = var2%>%
mean()))
}

诚然,我对列表不是很好,因为dpyr更新已经改变了分组的工作原理,所以这个问题有点有挑战性。



如果有一个更好的方式来做这个比单独的数据集,我会喜欢知道。



当我只通过一个变量分组时,我已经得到了一个类似于上述工作的循环。



任何和所有的帮助是非常感谢!谢谢!

解决方案

这似乎很有信心,可能有一种方法可以简化或者花费一个 do ,但它有效。使用您的 myData myGroups

  results = lapply(myGroups,FUN = function(x){
do.call(what = group_by_,args = c(list(myData),x))%>%
总结(n =长度(var1),
avgVar1 =平均值(var1))
}


>结果[[1]]
来源:本地数据框架[3 x 3]

var2 n avgVar1
1 a 31 0.38929738
2 b 31 -0.07451717
3 c 38 -0.22522129

>结果[[4]]
来源:本地数据框[9 x 4]
组:var2

var2 var3 n avgVar1
1 a A 11 -0.1159160
2 a B 11 0.5663312
3 a C 9 0.7904056
4 b A 7 0.0856384
5 b B 13 0.1309756
6 b C 11 -0.4192895
7 c A 15 -0.2783099
8 c B 10 -0.1110877
9 c C 13 -0.2517602

>结果[[7]]
#我不会将它们粘贴到这里,但它有27行,分组为var2,var3和var4。

我将您的总结调用为平均值 var1 因为 var2 不是数字。


Given a situation such as the following

library(dplyr)
myData <- tbl_df(data.frame( var1 = rnorm(100), 
                             var2 = letters[1:3] %>%
                                    sample(100, replace = TRUE) %>%
                                    factor(), 
                             var3 = LETTERS[1:3] %>%
                                    sample(100, replace = TRUE) %>%
                                    factor(), 
                             var4 = month.abb[1:3] %>%
                                    sample(100, replace = TRUE) %>%
                                    factor()))

I would like to group `myData' to eventually find summary data grouping by all possible combinations of var2, var3, and var4.

I can create a list with all possible combinations of variables as character values with

groupNames <- names(myData)[2:4]

myGroups <- Map(combn, 
              list(groupNames), 
              seq_along(groupNames),
              simplify = FALSE) %>%
              unlist(recursive = FALSE)

My plan was to make separate data sets for each variable combination with a for() loop, something like

### This Does Not Work
for (i in 1:length(myGroups)){
     assign( myGroups[i]%>%
             unlist() %>%
             paste0(collapse = "")%>%
             paste0("Data"), 
               myData %>% 
               group_by_(lapply(myGroups[[i]], as.symbol)) %>%
               summarise( n = length(var1), 
                             avgVar2 = var2 %>%
                                       mean()))
}

Admittedly I am not very good with lists, and looking up this issue was a bit challenging since dpyr updates have altered how grouping works a bit.

If there is a better way to do this than separate data sets I would love to know.

I've gotten a loop similar to above working when I am only grouping by a single variable.

Any and all help is greatly appreciated! Thank you!

解决方案

This seems convulated, and there's probably a way to simplify or fancy it up with a do, but it works. Using your myData and myGroups,

results = lapply(myGroups, FUN = function(x) {
    do.call(what = group_by_, args = c(list(myData), x)) %>%
        summarise( n = length(var1), 
                   avgVar1 = mean(var1))
    }
)

> results[[1]]
Source: local data frame [3 x 3]

  var2  n     avgVar1
1    a 31  0.38929738
2    b 31 -0.07451717
3    c 38 -0.22522129

> results[[4]]
Source: local data frame [9 x 4]
Groups: var2

  var2 var3  n    avgVar1
1    a    A 11 -0.1159160
2    a    B 11  0.5663312
3    a    C  9  0.7904056
4    b    A  7  0.0856384
5    b    B 13  0.1309756
6    b    C 11 -0.4192895
7    c    A 15 -0.2783099
8    c    B 10 -0.1110877
9    c    C 13 -0.2517602

> results[[7]]
# I won't paste them here, but it has all 27 rows, grouped by var2, var3 and var4.

I changed your summarise call to average var1 since var2 isn't numeric.

这篇关于使用dplyr对几个变量的所有可能组合进行分组的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆