如何解释dplyr消息`summarise()`通过'x'重新组合输出(用.groups参数覆盖)? [英] How to interpret dplyr message `summarise()` regrouping output by 'x' (override with `.groups` argument)?

查看:587
本文介绍了如何解释dplyr消息`summarise()`通过'x'重新组合输出(用.groups参数覆盖)?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在更新到dplyr开发版本0.8.99.9003之后,在运行group_by和summarise()时,我开始收到一条新消息(请参阅帖子标题)。

I started getting a new message (see post title) when running group_by and summarise() after updating to dplyr development version 0.8.99.9003.

这里是一个示例重新创建输出:

Here is an example to recreate the output:

library(tidyverse)
library(hablar)
df <- read_csv("year, week, rat_house_females, rat_house_males, mouse_wild_females, mouse_wild_males 
               2018,10,1,1,1,1
               2018,10,1,1,1,1
               2018,11,2,2,2,2
               2018,11,2,2,2,2
               2019,10,3,3,3,3
               2019,10,3,3,3,3
               2019,11,4,4,4,4
               2019,11,4,4,4,4") %>% 
  convert(chr(year,week)) %>% 
  mutate(total_rodents = rowSums(select_if(., is.numeric))) %>% 
  convert(num(year,week)) %>% 
  group_by(year,week) %>% summarise(average = mean(total_rodents))

The输出小标题是正确的,但是出现此消息:

The output tibble is correct, but this message appears:


summarise()按'year'重新组合输出(用 .groups 参数)

summarise() regrouping output by 'year' (override with .groups argument)

应如何解释?当我按年和周分组时,为什么只报告按年重新分组?另外,覆盖是什么意思,为什么我要这么做?

How should this be interpreted? Why does it report regrouping only by 'year' when I grouped by both year and week? Also, what does it mean to override and why would I want to do that?

我不认为该消息表明存在问题,因为它出现在dplyr小插图中:
https://cran.r-project.org/web/packages/dplyr/vignettes/programming.html

I don't think the message indicates a problem because it appears throughout the dplyr vignette: https://cran.r-project.org/web/packages/dplyr/vignettes/programming.html

我相信这是一条新消息,因为它仅出现在最近的SO问题中,例如 R在多个列上进行聚合(均未解决重新组合/覆盖消息)。

I believe it is a new message because it has only appeared on very recent SO questions such as How to melt pairwise.wilcox.test output using dplyr? and R Aggregate over multiple columns (neither of which addresses the regrouping/override message).

谢谢!

推荐答案

这只是一条友好的警告消息。默认情况下,如果摘要之前有任何分组,它将删除一个组变量,即 group_by 。如果只有一个分组变量,则总结之后将没有任何分组属性,并且如果有多个,即这里是两个,那么,用于分组减少为1,即数据将具有年作为分组属性。作为可重现的示例

It is just a friendly warning message. By default, if there is any grouping before the summarise, it drops one group variable i.e. the last one specified in the group_by. If there is only one grouping variable, there won't be any grouping attribute after the summarise and if there are more than one i.e. here it is two, so, the attribute for grouping is reduce to 1 i.e. the data would have the 'year' as grouping attribute. As a reproducible example

library(dplyr)
mtcars %>%
     group_by(am) %>% 
     summarise(mpg = sum(mpg))
#`summarise()` ungrouping output (override with `.groups` argument)
# A tibble: 2 x 2
#     am   mpg
#* <dbl> <dbl>
#1     0  326.
#2     1  317.

消息是 ungroup ,即,当只有一个 group_by 时,它将在摘要之后删除该分组

The message is that it is ungrouping i.e when there is a single group_by, it drops that grouping after the summarise

mtcars %>% 
   group_by(am, vs) %>% 
   summarise(mpg = sum(mpg))
#`summarise()` regrouping output by 'am' (override with `.groups` argument)
# A tibble: 4 x 3
# Groups:   am [2]
#     am    vs   mpg
#  <dbl> <dbl> <dbl>
#1     0     0  181.
#2     0     1  145.
#3     1     0  118.
#4     1     1  199.

在这里,它会删除最后一个分组,并用'am'重新分组。

Here, it drops the last grouping and regroup with the 'am'

如果我们检查总结,有一个 .groups 参数,默认为 drop_last ,其他选项是放下 保持 行进。

If we check the ?summarise, there is .groups argument which by default is "drop_last" and the other options are "drop", "keep", "rowwise"


.groups-结果的分组结构。

.groups - Grouping structure of the result.

drop_last:删除分组的最后一级。这是版本1.0.0之前唯一受支持的选项。

"drop_last": dropping the last level of grouping. This was the only supported option before version 1.0.0.

放置:所有级别的分组都将被删除。

"drop": All levels of grouping are dropped.

保留 :与.data相同的分组结构。

"keep": Same grouping structure as .data.

行:每一行都是它自己的组。

"rowwise": Each row is it's own group.

未指定.groups时,要么得到 drop_last,要么当所有结果均为大小1或保留时,表示如果大小不同。另外,除非选择了 dplyr.summarise.inform选项,否则消息将告知您该选择。

When .groups is not specified, you either get "drop_last" when all the results are size 1, or "keep" if the size varies. In addition, a message informs you of that choice, unless the option "dplyr.summarise.inform" is set to FALSE.

即。如果我们更改摘要中的 .groups ,我们将不会收到消息,因为已删除组属性

i.e. if we change the .groups in summarise, we don't get the message because the group attributes are removed

mtcars %>% 
    group_by(am) %>%
    summarise(mpg = sum(mpg), .groups = 'drop')
# A tibble: 2 x 2
#     am   mpg
#* <dbl> <dbl>
#1     0  326.
#2     1  317.


mtcars %>%
   group_by(am, vs) %>%
   summarise(mpg = sum(mpg), .groups = 'drop')
# A tibble: 4 x 3
#     am    vs   mpg
#* <dbl> <dbl> <dbl>
#1     0     0  181.
#2     0     1  145.
#3     1     0  118.
#4     1     1  199.


mtcars %>% 
   group_by(am, vs) %>% 
   summarise(mpg = sum(mpg), .groups = 'drop') %>%
   str
#tibble [4 × 3] (S3: tbl_df/tbl/data.frame)
# $ am : num [1:4] 0 0 1 1
# $ vs : num [1:4] 0 1 0 1
# $ mpg: num [1:4] 181 145 118 199

以前,没有发出此警告,它可能导致OP进行 mutate 或假设没有分组并导致意外输出的其他情况。现在,警告提示用户,我们应该注意存在分组属性

Previously, this warning was not issued and it could lead to situations where the OP does a mutate or something else assuming there is no grouping and results in unexpected output. Now, the warning gives the user an indication that we should be careful that there is a grouping attribute

注意: .groups 右边现在是生命周期中的实验。因此,可以在将来的版本中修改行为。

NOTE: The .groups right now is experimental in its lifecycle. So, the behaviour could be modified in the future releases

根据我们是否需要基于同一分组变量对数据进行任何转换(或不需要),我们可以选择不同的选项在 .groups 中。

Depending upon whether we need any transformation of the data based on the same grouping variable (or not needed), we could select the different options in .groups.

这篇关于如何解释dplyr消息`summarise()`通过'x'重新组合输出(用.groups参数覆盖)?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆