按组变量排列 grouped_df 不起作用 [英] Arrange a grouped_df by group variable not working

查看:14
本文介绍了按组变量排列 grouped_df 不起作用的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个 data.frame,其中包含客户姓名、年份和每年的几个收入数字.

I have a data.frame that contains client names, years, and several revenue numbers from each year.

df <- data.frame(client = rep(c("Client A","Client B", "Client C"),3), 
                 year = rep(c(2014,2013,2012), each=3), 
                 rev = rep(c(10,20,30),3)
                )

我想最终得到一个按客户和年份汇总收入的 data.frame.然后我想按年份对 data.frame 进行排序,然后按收入降序排序.

I want to end up with a data.frame that aggregates the revenue by client and year. I then want to sort the data.frame by year then by descending revenue.

library(dplyr)
df1 <- df %>% 
        group_by(client, year) %>%
        summarise(tot = sum(rev)) %>%
        arrange(year, desc(tot))

然而,当使用上面的代码时,arrange() 函数根本不会改变分组的 data.frame 的顺序.当我运行以下代码并强制转换为正常的 data.frame 时,它​​可以工作.

However, when using the code above the arrange() function doesn't change the order of the grouped data.frame at all. When I run the below code and coerce to a normal data.frame it works.

   library(dplyr)
    df1 <- df %>% 
            group_by(client, year) %>%
            summarise(tot = sum(rev)) %>%
            data.frame() %>%
            arrange(year, desc(tot))

我是不是遗漏了什么,还是每次尝试按分组变量排列 grouped_df 时都需要这样做?

Am I missing something or will I need to do this every time when trying to arrange a grouped_df by a grouped variable?

R 版本:3.1.1dplyr 包版本:0.3.0.2

R Version: 3.1.1 dplyr package version: 0.3.0.2

编辑 11/13/2017:正如 lucacerone 所指出的,从 dplyr 0.5 开始,在排序时再次安排忽略组.所以我的原始代码现在按照我最初预期的方式工作.

EDIT 11/13/2017: As noted by lucacerone, beginning with dplyr 0.5, arrange once again ignores groups when sorting. So my original code now works in the way I initially expected it would.

arrange() 再次忽略分组,恢复到 dplyr 0.3 及更早版本的行为.这使得arrange() 与其他dplyr 动词不一致,但我认为这种行为通常更有用.无论如何,它不会再次改变,因为更多的改变只​​会引起更多的混乱.

arrange() once again ignores grouping, reverting back to the behaviour of dplyr 0.3 and earlier. This makes arrange() inconsistent with other dplyr verbs, but I think this behaviour is generally more useful. Regardless, it’s not going to change again, as more changes will just cause more confusion.

推荐答案

尝试切换 group_by 语句的顺序:

Try switching the order of your group_by statement:

df %>% 
  group_by(year, client) %>%
  summarise(tot = sum(rev)) %>%
  arrange(year, desc(tot))

我认为 arrange 是在组内排序;在 summarize 之后,最后一个组被删除,所以这意味着在您的第一个示例中,它在 client 组中排列行.将顺序切换为 group_by(year, client) 似乎可以解决这个问题,因为 client 组在 summarize 后被删除.

I think arrange is ordering within groups; after summarize, the last group is dropped, so this means in your first example it's arranging rows within the client group. Switching the order to group_by(year, client) seems to fix it because the client group gets dropped after summarize.

或者,还有 ungroup() 函数

df %>% 
  group_by(client, year) %>%
  summarise(tot = sum(rev)) %>%
  ungroup() %>%
  arrange(year, desc(tot))

<小时>

编辑,@lucacerone:因为 dplyr 0.5 这不再起作用:


Edit, @lucacerone: since dplyr 0.5 this does not work anymore:

破坏性更改安排()再次忽略分组,恢复原状dplyr 0.3 及更早版本的行为.这使得安排()与其他 dplyr 动词不一致,但我认为这种行为是一般比较有用.无论如何,它不会再次改变,因为更多的变化只会引起更多的混乱.

Breaking changes arrange() once again ignores grouping, reverting back to the behaviour of dplyr 0.3 and earlier. This makes arrange() inconsistent with other dplyr verbs, but I think this behaviour is generally more useful. Regardless, it’s not going to change again, as more changes will just cause more confusion.

这篇关于按组变量排列 grouped_df 不起作用的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆