使用dplyr对几个变量的所有可能组合进行分组 [英] Grouping Over All Possible Combinations of Several Variables With dplyr

查看：136 发布时间：2017/7/13 21:04:03 r dplyr summary

本文介绍了使用dplyr对几个变量的所有可能组合进行分组的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

考虑到以下情况：

  library（dplyr）
 myData < -  tbl_df（data。 frame（var1 = rnorm（100），
 var2 = letters [1：3]％>％
 sample（100，replace = TRUE）％>％
 factor（），
 var3 = LETTERS [1：3]％>％
 sample（100，replace = TRUE）％>％
 factor（），
 var4 = month.abb [1 ：3]％>％
 sample（100，replace = TRUE）％>％
 factor（）））

我想组合myData，最终通过var2，var3和var4的所有可能组合查找汇总数据分组。

我可以使用

创建一个包含所有可变变量组合的列表作为字符值。

  groupNames<  -  names（myData）[2：4] 
 
 myGroups<  -  Map（combn，
 list（groupNames），
 seq_along（groupNames） ，
 simplified = FALSE）％>％
 unlist（recursive = FALSE）

我的计划是使用for（）循环为每个变量组合创建单独的数据集，如

  ###这不工作
 for（i in 1：length（myGroups））{
 assign（myGroups [i]％>％
 unlist（）％>％
 paste0 （collapse =）％>％
 paste0（Data），
 myData％>％
 group_by_（lapply（myGroups [[i]]，as.symbol）） ％>％
总汇（n =长度（var1），
 avgVar2 = var2％>％
 mean（）））
}

诚然，我对列表不是很好，因为dpyr更新已经改变了分组的工作原理，所以这个问题有点有挑战性。

如果有一个更好的方式来做这个比单独的数据集，我会喜欢知道。

当我只通过一个变量分组时，我已经得到了一个类似于上述工作的循环。

任何和所有的帮助是非常感谢！谢谢！

解决方案

这似乎很有信心，可能有一种方法可以简化或者花费一个 do ，但它有效。使用您的 myData 和 myGroups ，

  results = lapply（myGroups，FUN = function（x）{
 do.call（what = group_by_，args = c（list（myData），x））％>％
总结（n =长度（var1），
 avgVar1 =平均值（var1））
} 
）
 
>结果[[1]] 
来源：本地数据框架[3 x 3] 
 
 var2 n avgVar1 
 1 a 31 0.38929738 
 2 b 31 -0.07451717 
 3 c 38 -0.22522129 
 
>结果[[4]] 
来源：本地数据框[9 x 4] 
组：var2 
 
 var2 var3 n avgVar1 
 1 a A 11 -0.1159160 
 2 a B 11 0.5663312 
 3 a C 9 0.7904056 
 4 b A 7 0.0856384 
 5 b B 13 0.1309756 
 6 b C 11 -0.4192895 
 7 c A 15 -0.2783099 
 8 c B 10 -0.1110877 
 9 c C 13 -0.2517602 
 
>结果[[7]] 
＃我不会将它们粘贴到这里，但它有27行，分组为var2，var3和var4。

我将您的总结调用为平均值 var1 因为 var2 不是数字。

Given a situation such as the following

library(dplyr)
myData <- tbl_df(data.frame( var1 = rnorm(100), 
                             var2 = letters[1:3] %>%
                                    sample(100, replace = TRUE) %>%
                                    factor(), 
                             var3 = LETTERS[1:3] %>%
                                    sample(100, replace = TRUE) %>%
                                    factor(), 
                             var4 = month.abb[1:3] %>%
                                    sample(100, replace = TRUE) %>%
                                    factor()))

I would like to group `myData' to eventually find summary data grouping by all possible combinations of var2, var3, and var4.

I can create a list with all possible combinations of variables as character values with

groupNames <- names(myData)[2:4]

myGroups <- Map(combn, 
              list(groupNames), 
              seq_along(groupNames),
              simplify = FALSE) %>%
              unlist(recursive = FALSE)

My plan was to make separate data sets for each variable combination with a for() loop, something like

### This Does Not Work
for (i in 1:length(myGroups)){
     assign( myGroups[i]%>%
             unlist() %>%
             paste0(collapse = "")%>%
             paste0("Data"), 
               myData %>% 
               group_by_(lapply(myGroups[[i]], as.symbol)) %>%
               summarise( n = length(var1), 
                             avgVar2 = var2 %>%
                                       mean()))
}

Admittedly I am not very good with lists, and looking up this issue was a bit challenging since dpyr updates have altered how grouping works a bit.

If there is a better way to do this than separate data sets I would love to know.

I've gotten a loop similar to above working when I am only grouping by a single variable.

Any and all help is greatly appreciated! Thank you!

解决方案

This seems convulated, and there's probably a way to simplify or fancy it up with a do, but it works. Using your myData and myGroups,

results = lapply(myGroups, FUN = function(x) {
    do.call(what = group_by_, args = c(list(myData), x)) %>%
        summarise( n = length(var1), 
                   avgVar1 = mean(var1))
    }
)

> results[[1]]
Source: local data frame [3 x 3]

  var2  n     avgVar1
1    a 31  0.38929738
2    b 31 -0.07451717
3    c 38 -0.22522129

> results[[4]]
Source: local data frame [9 x 4]
Groups: var2

  var2 var3  n    avgVar1
1    a    A 11 -0.1159160
2    a    B 11  0.5663312
3    a    C  9  0.7904056
4    b    A  7  0.0856384
5    b    B 13  0.1309756
6    b    C 11 -0.4192895
7    c    A 15 -0.2783099
8    c    B 10 -0.1110877
9    c    C 13 -0.2517602

> results[[7]]
# I won't paste them here, but it has all 27 rows, grouped by var2, var3 and var4.

I changed your summarise call to average var1 since var2 isn't numeric.

这篇关于使用dplyr对几个变量的所有可能组合进行分组的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

使用dplyr对几个变量的所有可能组合进行分组 [英] Grouping Over All Possible Combinations of Several Variables With dplyr

问题描述

相关文章

其他开发语言最新文章

热门教程

热门工具

登录关闭

使用dplyr对几个变量的所有可能组合进行分组 [英] Grouping Over All Possible Combinations of Several Variables With dplyr

问题描述

相关文章

其他开发语言最新文章

热门教程

热门工具

登录 关闭

登录关闭