按组变量排列 grouped_df 不起作用 [英] Arrange a grouped_df by group variable not working
问题描述
我有一个 data.frame,其中包含客户姓名、年份和每年的几个收入数字.
I have a data.frame that contains client names, years, and several revenue numbers from each year.
df <- data.frame(client = rep(c("Client A","Client B", "Client C"),3),
year = rep(c(2014,2013,2012), each=3),
rev = rep(c(10,20,30),3)
)
我想最终得到一个按客户和年份汇总收入的 data.frame.然后我想按年份对 data.frame 进行排序,然后按收入降序排序.
I want to end up with a data.frame that aggregates the revenue by client and year. I then want to sort the data.frame by year then by descending revenue.
library(dplyr)
df1 <- df %>%
group_by(client, year) %>%
summarise(tot = sum(rev)) %>%
arrange(year, desc(tot))
然而,当使用上面的代码时,arrange()
函数根本不会改变分组的 data.frame 的顺序.当我运行以下代码并强制转换为正常的 data.frame 时,它可以工作.
However, when using the code above the arrange()
function doesn't change the order of the grouped data.frame at all. When I run the below code and coerce to a normal data.frame it works.
library(dplyr)
df1 <- df %>%
group_by(client, year) %>%
summarise(tot = sum(rev)) %>%
data.frame() %>%
arrange(year, desc(tot))
我是不是遗漏了什么,还是每次尝试按分组变量排列
grouped_df 时都需要这样做?
Am I missing something or will I need to do this every time when trying to arrange
a grouped_df by a grouped variable?
R 版本:3.1.1dplyr 包版本:0.3.0.2
R Version: 3.1.1 dplyr package version: 0.3.0.2
编辑 11/13/2017:正如 lucacerone 所指出的,从 dplyr 0.5 开始,在排序时再次安排忽略组.所以我的原始代码现在按照我最初预期的方式工作.
EDIT 11/13/2017: As noted by lucacerone, beginning with dplyr 0.5, arrange once again ignores groups when sorting. So my original code now works in the way I initially expected it would.
arrange() 再次忽略分组,恢复到 dplyr 0.3 及更早版本的行为.这使得arrange() 与其他dplyr 动词不一致,但我认为这种行为通常更有用.无论如何,它不会再次改变,因为更多的改变只会引起更多的混乱.
arrange() once again ignores grouping, reverting back to the behaviour of dplyr 0.3 and earlier. This makes arrange() inconsistent with other dplyr verbs, but I think this behaviour is generally more useful. Regardless, it’s not going to change again, as more changes will just cause more confusion.
推荐答案
尝试切换 group_by
语句的顺序:
Try switching the order of your group_by
statement:
df %>%
group_by(year, client) %>%
summarise(tot = sum(rev)) %>%
arrange(year, desc(tot))
我认为 arrange
是在组内排序;在 summarize
之后,最后一个组被删除,所以这意味着在您的第一个示例中,它在 client
组中排列行.将顺序切换为 group_by(year, client)
似乎可以解决这个问题,因为 client
组在 summarize
后被删除.
I think arrange
is ordering within groups; after summarize
, the last group is dropped, so this means in your first example it's arranging rows within the client
group. Switching the order to group_by(year, client)
seems to fix it because the client
group gets dropped after summarize
.
或者,还有 ungroup()
函数
df %>%
group_by(client, year) %>%
summarise(tot = sum(rev)) %>%
ungroup() %>%
arrange(year, desc(tot))
<小时>
编辑,@lucacerone:因为 dplyr 0.5 这不再起作用:
Edit, @lucacerone: since dplyr 0.5 this does not work anymore:
破坏性更改安排()再次忽略分组,恢复原状dplyr 0.3 及更早版本的行为.这使得安排()与其他 dplyr 动词不一致,但我认为这种行为是一般比较有用.无论如何,它不会再次改变,因为更多的变化只会引起更多的混乱.
Breaking changes arrange() once again ignores grouping, reverting back to the behaviour of dplyr 0.3 and earlier. This makes arrange() inconsistent with other dplyr verbs, but I think this behaviour is generally more useful. Regardless, it’s not going to change again, as more changes will just cause more confusion.
这篇关于按组变量排列 grouped_df 不起作用的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!