使用group_by（多个变量）时，dplyr问题 [英] dplyr issues when using group_by(multiple variables)

查看：174 发布时间：2017/7/13 20:13:10 r group-by dplyr compound-key

本文介绍了使用group_by（多个变量）时，dplyr问题的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我想开始使用dplyr代替ddply，但是我无法得到它的工作原理（我已经阅读了文档）。

例如，为什么当我尝试mutate（）时，group_by功能不能正常工作？

查看mtcars：

库（汽车）

说我做一个data.frame这是mtcars的总结，按cyl和gear分组：

  df1<  -  mtcars％。％
 group_by（cyl，gear）％。％
总结（
 newvar = sum（wt）
）

然后说我想进一步总结这个数据框。使用ddply，这很简单，但是当我尝试用dplyr时，实际上并不是分组的：

  df2<  -  df1％。％
 group_by（cyl）％。％
 mutate（
 newvar2 = newvar + 5 
）

仍然产生未分组的输出：

  cyl齿轮newvar newvar2 
 1 6 3 6.675 11.675 
 2 4 4 19.025 24.025 
 3 6 4 12.375 17.375 
 4 6 5 2.770 7.770 
 5 4 3 2.465 7.465 
 6 8 3 49.249 54.249 
 7 4 5 3.653 8.653 
 8 8 5 6.740 11.740

我做错了语法？

编辑：

如果我用plyr和ddply这样做：

  df1<  -  ddply（mtcars， （cyl，gear），总结，newvar = sum（wt））

然后得到第二个df：

  df2<  -  ddply（df1，。（cyl），总结，newvar2 = sum（newvar）+ 5）

但是同样的方法，sum（newvar）+ 5在summaryize（）函数不适用于dplyr ...

解决方案

让Dickoa的答案进一步 - 就像Hadley所说的总结剥离一个单一的分组。它从您应用它的相反顺序剥离分组，所以您可以使用

  mtcars％>％
 group_by（cyl，gear）％>％
总结（newvar = sum（wt））％>％
总结（newvar2 = sum（newvar）+ 5）

请注意，如果您使用 group_by（gear，cyl）在第二行。

要让您的第一个尝试工作：

  df1< -  mtcars％>％
 group_by（cyl，gear）％>％
总结（newvar = sum（wt））
 
 df2 <-df1％>％ 
 group_by（cyl）％>％
总汇（newvar2 = sum（newvar）+5）

I want to start using dplyr in place of ddply but I can't get a handle on how it works (I've read the documentation).

For example, why when I try to mutate() something does the "group_by" function not work as it's supposed to?

Looking at mtcars:

library(car)

Say I make a data.frame which is a summary of mtcars, grouped by "cyl" and "gear":

df1 <- mtcars %.%
            group_by(cyl, gear) %.%
            summarise(
                newvar = sum(wt)
            )

Then say I want to further summarise this dataframe. With ddply, it'd be straightforward, but when I try to do with with dplyr, it's not actually "grouping by":

df2 <- df1 %.%
            group_by(cyl) %.%
            mutate(
                newvar2 = newvar + 5
            )

Still yields an ungrouped output:

  cyl gear newvar newvar2
1   6    3  6.675  11.675
2   4    4 19.025  24.025
3   6    4 12.375  17.375
4   6    5  2.770   7.770
5   4    3  2.465   7.465
6   8    3 49.249  54.249
7   4    5  3.653   8.653
8   8    5  6.740  11.740

Am I doing something wrong with the syntax?

Edit:

If I were to do this with plyr and ddply:

df1 <- ddply(mtcars, .(cyl, gear), summarise, newvar = sum(wt))

and then to get the second df:

df2 <- ddply(df1, .(cyl), summarise, newvar2 = sum(newvar) + 5)

But that same approach, with sum(newvar) + 5 in the summarise() function doesn't work with dplyr...

解决方案

Taking Dickoa's answer one step further -- as Hadley says "summarise peels off a single layer of grouping". It peels off grouping from the reverse order in which you applied it so you can just use

mtcars %>%
 group_by(cyl, gear) %>%
 summarise(newvar = sum(wt)) %>%
 summarise(newvar2 = sum(newvar) + 5)

Note that this will give a different answer if you use group_by(gear, cyl) in the second line.

And to get your first attempt working:

df1 <- mtcars %>%
 group_by(cyl, gear) %>%
 summarise(newvar = sum(wt))

df2 <- df1 %>%
 group_by(cyl) %>%
 summarise(newvar2 = sum(newvar)+5)

这篇关于使用group_by（多个变量）时，dplyr问题的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

使用group_by（多个变量）时，dplyr问题 [英] dplyr issues when using group_by(multiple variables)

问题描述

相关文章

其他开发语言最新文章

热门教程

热门工具

登录关闭

使用group_by（多个变量）时，dplyr问题 [英] dplyr issues when using group_by(multiple variables)

问题描述

相关文章

其他开发语言最新文章

热门教程

热门工具

登录 关闭

登录关闭