按组变量排列grouped_df不起作用 [英] Arrange a grouped_df by group variable not working

查看:132
本文介绍了按组变量排列grouped_df不起作用的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个data.frame,其中包含客户名称,年份和每年的多个收入数字.

I have a data.frame that contains client names, years, and several revenue numbers from each year.

df <- data.frame(client = rep(c("Client A","Client B", "Client C"),3), 
                 year = rep(c(2014,2013,2012), each=3), 
                 rev = rep(c(10,20,30),3)
                )

我想以一个data.frame结尾,该框架按客户和年份汇总收入.然后,我想按年份对data.frame进行排序,然后按收入递减.

I want to end up with a data.frame that aggregates the revenue by client and year. I then want to sort the data.frame by year then by descending revenue.

library(dplyr)
df1 <- df %>% 
        group_by(client, year) %>%
        summarise(tot = sum(rev)) %>%
        arrange(year, desc(tot))

但是,使用arrange()函数上方的代码时,根本不会更改分组的data.frame的顺序.当我运行以下代码并将其强制转换为正常的data.frame时,它就会起作用.

However, when using the code above the arrange() function doesn't change the order of the grouped data.frame at all. When I run the below code and coerce to a normal data.frame it works.

   library(dplyr)
    df1 <- df %>% 
            group_by(client, year) %>%
            summarise(tot = sum(rev)) %>%
            data.frame() %>%
            arrange(year, desc(tot))

我是否缺少某些东西?还是每次尝试通过分组变量arrange grouped_df时都需要这样做吗?

Am I missing something or will I need to do this every time when trying to arrange a grouped_df by a grouped variable?

R版本:3.1.1 dplyr软件包版本:0.3.0.2

R Version: 3.1.1 dplyr package version: 0.3.0.2

编辑11/13/2017: lucacerone 所述,从dplyr 0.5开始,排序时再次忽略组.因此,我的原始代码现在可以按照我最初预期的方式工作.

EDIT 11/13/2017: As noted by lucacerone, beginning with dplyr 0.5, arrange once again ignores groups when sorting. So my original code now works in the way I initially expected it would.

arrange()再次忽略分组,恢复为dplyr 0.3及更早版本的行为.这使ranging()与其他dplyr动词不一致,但我认为这种行为通常更有用.无论如何,它不会再改变,因为更多的改变只​​会引起更多的混乱.

arrange() once again ignores grouping, reverting back to the behaviour of dplyr 0.3 and earlier. This makes arrange() inconsistent with other dplyr verbs, but I think this behaviour is generally more useful. Regardless, it’s not going to change again, as more changes will just cause more confusion.

推荐答案

尝试切换group_by语句的顺序:

df %>% 
  group_by(year, client) %>%
  summarise(tot = sum(rev)) %>%
  arrange(year, desc(tot))

我认为arrange正在组内排序;在summarize之后,最后一个组被删除,因此这意味着在您的第一个示例中,它在client组中排列行.将顺序切换为group_by(year, client)似乎可以解决问题,因为client组在summarize之后被删除了.

I think arrange is ordering within groups; after summarize, the last group is dropped, so this means in your first example it's arranging rows within the client group. Switching the order to group_by(year, client) seems to fix it because the client group gets dropped after summarize.

或者,有ungroup()函数

df %>% 
  group_by(client, year) %>%
  summarise(tot = sum(rev)) %>%
  ungroup() %>%
  arrange(year, desc(tot))


编辑,@ lucacerone:,因为dplyr 0.5不再起作用:


Edit, @lucacerone: since dplyr 0.5 this does not work anymore:

突破性的更改Arrange()再次忽略分组,恢复原状 dplyr 0.3及更早版本的行为.这使得ranging() 与其他dplyr动词不一致,但我认为这种行为是 通常比较有用.无论如何,它不会再改变,因为 更多更改只会引起更多混乱.

Breaking changes arrange() once again ignores grouping, reverting back to the behaviour of dplyr 0.3 and earlier. This makes arrange() inconsistent with other dplyr verbs, but I think this behaviour is generally more useful. Regardless, it’s not going to change again, as more changes will just cause more confusion.

这篇关于按组变量排列grouped_df不起作用的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆