如何解释 dplyr 消息“summarise()"通过“x"重新分组输出(用“.groups"参数覆盖)? [英] How to interpret dplyr message `summarise()` regrouping output by 'x' (override with `.groups` argument)?

查看:14
本文介绍了如何解释 dplyr 消息“summarise()"通过“x"重新分组输出(用“.groups"参数覆盖)?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在更新到 dplyr 开发版本 0.8.99.9003 后运行 group_by 和 summarise() 时,我开始收到一条新消息(见帖子标题).

I started getting a new message (see post title) when running group_by and summarise() after updating to dplyr development version 0.8.99.9003.

这是重新创建输出的示例:

Here is an example to recreate the output:

library(tidyverse)
library(hablar)
df <- read_csv("year, week, rat_house_females, rat_house_males, mouse_wild_females, mouse_wild_males 
               2018,10,1,1,1,1
               2018,10,1,1,1,1
               2018,11,2,2,2,2
               2018,11,2,2,2,2
               2019,10,3,3,3,3
               2019,10,3,3,3,3
               2019,11,4,4,4,4
               2019,11,4,4,4,4") %>% 
  convert(chr(year,week)) %>% 
  mutate(total_rodents = rowSums(select_if(., is.numeric))) %>% 
  convert(num(year,week)) %>% 
  group_by(year,week) %>% summarise(average = mean(total_rodents))

输出tibble是正确的,但出现此消息:

The output tibble is correct, but this message appears:

summarise() 按 'year' 重新分组输出(用 .groups 参数覆盖)

summarise() regrouping output by 'year' (override with .groups argument)

这应该如何解释?当我按年和周分组时,为什么它只报告按年"重新分组?另外,覆盖是什么意思,我为什么要这样做?

How should this be interpreted? Why does it report regrouping only by 'year' when I grouped by both year and week? Also, what does it mean to override and why would I want to do that?

我不认为该消息表明存在问题,因为它出现在整个 dplyr 小插图中:https://cran.r-project.org/web/packages/dplyr/vignettes/programming.html

I don't think the message indicates a problem because it appears throughout the dplyr vignette: https://cran.r-project.org/web/packages/dplyr/vignettes/programming.html

我相信这是一条新消息,因为它只出现在最近的 SO 问题上,例如 如何使用 dplyr 融合 pairwise.wilcox.test 输出?R 在多列上聚合(都没有解决重组/覆盖消息).

I believe it is a new message because it has only appeared on very recent SO questions such as How to melt pairwise.wilcox.test output using dplyr? and R Aggregate over multiple columns (neither of which addresses the regrouping/override message).

谢谢!

推荐答案

这只是一个友好的警告信息.默认情况下,如果 summarise 之前有任何分组,它会删除一个组变量,即 group_by 中指定的最后一个.如果分组变量只有一个,在summarise之后就没有分组属性,如果有多个即这里是两个,那么分组的属性减为1即数据将具有年份"作为分组属性.作为一个可重复的例子

It is just a friendly warning message. By default, if there is any grouping before the summarise, it drops one group variable i.e. the last one specified in the group_by. If there is only one grouping variable, there won't be any grouping attribute after the summarise and if there are more than one i.e. here it is two, so, the attribute for grouping is reduce to 1 i.e. the data would have the 'year' as grouping attribute. As a reproducible example

library(dplyr)
mtcars %>%
     group_by(am) %>% 
     summarise(mpg = sum(mpg))
#`summarise()` ungrouping output (override with `.groups` argument)
# A tibble: 2 x 2
#     am   mpg
#* <dbl> <dbl>
#1     0  326.
#2     1  317.

消息是它正在ungrouping,即当有一个group_by时,它会在summarise

The message is that it is ungrouping i.e when there is a single group_by, it drops that grouping after the summarise

mtcars %>% 
   group_by(am, vs) %>% 
   summarise(mpg = sum(mpg))
#`summarise()` regrouping output by 'am' (override with `.groups` argument)
# A tibble: 4 x 3
# Groups:   am [2]
#     am    vs   mpg
#  <dbl> <dbl> <dbl>
#1     0     0  181.
#2     0     1  145.
#3     1     0  118.
#4     1     1  199.

在这里,它删除最后一个分组并使用am"重新分组

Here, it drops the last grouping and regroup with the 'am'

如果我们检查 ?summarise,有 .groups 参数,默认情况下是 "drop_last" 其他选项是 "drop", "keep", "rowwise"

If we check the ?summarise, there is .groups argument which by default is "drop_last" and the other options are "drop", "keep", "rowwise"

.groups - 结果的分组结构.

.groups - Grouping structure of the result.

drop_last":删除分组的最后一级.这是 1.0.0 版本之前唯一支持的选项.

"drop_last": dropping the last level of grouping. This was the only supported option before version 1.0.0.

drop":删除所有级别的分组.

"drop": All levels of grouping are dropped.

keep":与 .data 相同的分组结构.

"keep": Same grouping structure as .data.

rowwise":每一行都是它自己的组.

"rowwise": Each row is it's own group.

当 .groups 未指定时,您要么得到drop_last";当所有结果都是大小 1 或保持"时如果大小不同.此外,一条消息会通知您该选择,除非选项dplyr.summarise.inform"不可用.设置为 FALSE.

When .groups is not specified, you either get "drop_last" when all the results are size 1, or "keep" if the size varies. In addition, a message informs you of that choice, unless the option "dplyr.summarise.inform" is set to FALSE.

即如果我们更改summarise中的.groups,我们不会收到消息,因为删除了组属性

i.e. if we change the .groups in summarise, we don't get the message because the group attributes are removed

mtcars %>% 
    group_by(am) %>%
    summarise(mpg = sum(mpg), .groups = 'drop')
# A tibble: 2 x 2
#     am   mpg
#* <dbl> <dbl>
#1     0  326.
#2     1  317.


mtcars %>%
   group_by(am, vs) %>%
   summarise(mpg = sum(mpg), .groups = 'drop')
# A tibble: 4 x 3
#     am    vs   mpg
#* <dbl> <dbl> <dbl>
#1     0     0  181.
#2     0     1  145.
#3     1     0  118.
#4     1     1  199.


mtcars %>% 
   group_by(am, vs) %>% 
   summarise(mpg = sum(mpg), .groups = 'drop') %>%
   str
#tibble [4 × 3] (S3: tbl_df/tbl/data.frame)
# $ am : num [1:4] 0 0 1 1
# $ vs : num [1:4] 0 1 0 1
# $ mpg: num [1:4] 181 145 118 199

以前,此警告未发出,它可能导致 OP 执行 mutate 或其他假设没有分组并导致意外输出的情况.现在,警告向用户表明我们应该注意存在分组属性

Previously, this warning was not issued and it could lead to situations where the OP does a mutate or something else assuming there is no grouping and results in unexpected output. Now, the warning gives the user an indication that we should be careful that there is a grouping attribute

注意:.groups 现在在其生命周期中是 实验性.因此,该行为可能会在未来的版本中进行修改

NOTE: The .groups right now is experimental in its lifecycle. So, the behaviour could be modified in the future releases

根据我们是否需要基于相同的分组变量(或不需要)对数据进行任何转换,我们可以在.groups中选择不同的选项.

Depending upon whether we need any transformation of the data based on the same grouping variable (or not needed), we could select the different options in .groups.

这篇关于如何解释 dplyr 消息“summarise()"通过“x"重新分组输出(用“.groups"参数覆盖)?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆