使用group_by(多个变量)时,dplyr问题 [英] dplyr issues when using group_by(multiple variables)

查看:174
本文介绍了使用group_by(多个变量)时,dplyr问题的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想开始使用dplyr代替ddply,但是我无法得到它的工作原理(我已经阅读了文档)。



例如,为什么当我尝试mutate()时,group_by功能不能正常工作?



查看mtcars:



库(汽车)



说我做一个data.frame这是mtcars的总结,按cyl和gear分组:

  df1<  -  mtcars%。%
group_by(cyl,gear)%。%
总结(
newvar = sum(wt)

然后说我想进一步总结这个数据框。使用ddply,这很简单,但是当我尝试用dplyr时,实际上并不是分组的:

  df2<  -  df1%。%
group_by(cyl)%。%
mutate(
newvar2 = newvar + 5

仍然产生未分组的输出:

  cyl齿轮newvar newvar2 
1 6 3 6.675 11.675
2 4 4 19.025 24.025
3 6 4 12.375 17.375
4 6 5 2.770 7.770
5 4 3 2.465 7.465
6 8 3 49.249 54.249
7 4 5 3.653 8.653
8 8 5 6.740 11.740

我做错了语法?






编辑:



如果我用plyr和ddply这样做:

  df1<  -  ddply(mtcars, (cyl,gear),总结,newvar = sum(wt))

然后得到第二个df:

  df2<  -  ddply(df1,。(cyl),总结,newvar2 = sum(newvar)+ 5)

但是同样的方法,sum(newvar)+ 5在summaryize()函数不适用于dplyr ...

解决方案

让Dickoa的答案进一步 - 就像Hadley所说的总结剥离一个单一的分组。它从您应用它的相反顺序剥离分组,所以您可以使用

  mtcars%>%
group_by(cyl,gear)%>%
总结(newvar = sum(wt))%>%
总结(newvar2 = sum(newvar)+ 5)

请注意,如果您使用 group_by(gear,cyl)在第二行。



要让您的第一个尝试工作:

  df1< -  mtcars%>%
group_by(cyl,gear)%>%
总结(newvar = sum(wt))

df2 <-df1%>%
group_by(cyl)%>%
总汇(newvar2 = sum(newvar)+5)


I want to start using dplyr in place of ddply but I can't get a handle on how it works (I've read the documentation).

For example, why when I try to mutate() something does the "group_by" function not work as it's supposed to?

Looking at mtcars:

library(car)

Say I make a data.frame which is a summary of mtcars, grouped by "cyl" and "gear":

df1 <- mtcars %.%
            group_by(cyl, gear) %.%
            summarise(
                newvar = sum(wt)
            )

Then say I want to further summarise this dataframe. With ddply, it'd be straightforward, but when I try to do with with dplyr, it's not actually "grouping by":

df2 <- df1 %.%
            group_by(cyl) %.%
            mutate(
                newvar2 = newvar + 5
            )

Still yields an ungrouped output:

  cyl gear newvar newvar2
1   6    3  6.675  11.675
2   4    4 19.025  24.025
3   6    4 12.375  17.375
4   6    5  2.770   7.770
5   4    3  2.465   7.465
6   8    3 49.249  54.249
7   4    5  3.653   8.653
8   8    5  6.740  11.740

Am I doing something wrong with the syntax?


Edit:

If I were to do this with plyr and ddply:

df1 <- ddply(mtcars, .(cyl, gear), summarise, newvar = sum(wt))

and then to get the second df:

df2 <- ddply(df1, .(cyl), summarise, newvar2 = sum(newvar) + 5)

But that same approach, with sum(newvar) + 5 in the summarise() function doesn't work with dplyr...

解决方案

Taking Dickoa's answer one step further -- as Hadley says "summarise peels off a single layer of grouping". It peels off grouping from the reverse order in which you applied it so you can just use

mtcars %>%
 group_by(cyl, gear) %>%
 summarise(newvar = sum(wt)) %>%
 summarise(newvar2 = sum(newvar) + 5)

Note that this will give a different answer if you use group_by(gear, cyl) in the second line.

And to get your first attempt working:

df1 <- mtcars %>%
 group_by(cyl, gear) %>%
 summarise(newvar = sum(wt))

df2 <- df1 %>%
 group_by(cyl) %>%
 summarise(newvar2 = sum(newvar)+5)

这篇关于使用group_by(多个变量)时,dplyr问题的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆