使用 group_by(多个变量)时的 dplyr 问题 [英] dplyr issues when using group_by(multiple variables)
问题描述
我想开始使用 dplyr 代替 ddply 但我不知道它是如何工作的(我已经阅读了文档).
I want to start using dplyr in place of ddply but I can't get a handle on how it works (I've read the documentation).
例如,为什么当我尝试 mutate() 时,group_by"函数无法正常工作?
For example, why when I try to mutate() something does the "group_by" function not work as it's supposed to?
看 mtcars:
图书馆(汽车)
假设我制作了一个 data.frame,它是 mtcars 的摘要,按cyl"和gear"分组:
Say I make a data.frame which is a summary of mtcars, grouped by "cyl" and "gear":
df1 <- mtcars %.%
group_by(cyl, gear) %.%
summarise(
newvar = sum(wt)
)
然后说我想进一步总结这个数据框.使用 ddply,它会很简单,但是当我尝试使用 dplyr 时,它实际上并不是分组依据":
Then say I want to further summarise this dataframe. With ddply, it'd be straightforward, but when I try to do with with dplyr, it's not actually "grouping by":
df2 <- df1 %.%
group_by(cyl) %.%
mutate(
newvar2 = newvar + 5
)
仍然产生未分组的输出:
Still yields an ungrouped output:
cyl gear newvar newvar2
1 6 3 6.675 11.675
2 4 4 19.025 24.025
3 6 4 12.375 17.375
4 6 5 2.770 7.770
5 4 3 2.465 7.465
6 8 3 49.249 54.249
7 4 5 3.653 8.653
8 8 5 6.740 11.740
我的语法有问题吗?
如果我用 plyr 和 ddply 来做这件事:
If I were to do this with plyr and ddply:
df1 <- ddply(mtcars, .(cyl, gear), summarise, newvar = sum(wt))
然后得到第二个df:
df2 <- ddply(df1, .(cyl), summarise, newvar2 = sum(newvar) + 5)
但是同样的方法,在 summarise() 函数中使用 sum(newvar) + 5 不适用于 dplyr ...
But that same approach, with sum(newvar) + 5 in the summarise() function doesn't work with dplyr...
推荐答案
让 Dickoa 的答案更进一步——正如 Hadley 所说的总结剥离了单层分组".它从您应用它的相反顺序剥离分组,以便您可以使用
Taking Dickoa's answer one step further -- as Hadley says "summarise peels off a single layer of grouping". It peels off grouping from the reverse order in which you applied it so you can just use
mtcars %>%
group_by(cyl, gear) %>%
summarise(newvar = sum(wt)) %>%
summarise(newvar2 = sum(newvar) + 5)
请注意,如果您在第二行中使用 group_by(gear, cyl)
,这将给出不同的答案.
Note that this will give a different answer if you use group_by(gear, cyl)
in the second line.
为了让您的第一次尝试成功:
And to get your first attempt working:
df1 <- mtcars %>%
group_by(cyl, gear) %>%
summarise(newvar = sum(wt))
df2 <- df1 %>%
group_by(cyl) %>%
summarise(newvar2 = sum(newvar)+5)
这篇关于使用 group_by(多个变量)时的 dplyr 问题的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!