在R dplyr中,为什么我在count()之后需要ungroup()? [英] in R dplyr why do I need to ungroup() after I count()?

查看:1246
本文介绍了在R dplyr中,为什么我在count()之后需要ungroup()?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

当我第一次开始在RI中编程时,经常会使用dplyr count()

  library (tidyverse)
mtcars%>%count(cyl)

一旦我开始使用 apply 函数我开始遇到 count()的问题。如果我只是在 count()的末尾添加 ungroup(),问题就会消失。



<我没有任何可复制的东西要显示。但是有人可以解释这个问题可能是什么,为什么 ungroup()总是可以解决它,并且在每个 count()之后一致使用 ungroup()有任何弊端。 ),还是在任何 group_by()之后?当然,我假设在对数据进行计数或汇总后不再需要对数据进行分组。

  mtcars%&%;%count( cyl)%>%ungroup()


解决方案

问题您以前遇到的是 count()的旧行为。
如果执行以下操作,则最高到dplyr 0.5.0:

  mtcars%&%;%
count( cyl,wt)

结果仍将按 cyl分组列。例如,这意味着,如果在其后加上 summarize(mean(am))之类的内容,则每个 cyl都将获得一行当您可能期望总行数时。如果在计数后加上%>%ungroup(),该问题将得到解决。



此行为在dplyr 0.7.0中已更改(于2017年6月发布) ),这样 count()会保留其输入的分组(表示 mtcars%>%count(wt,cyl)现在返回一个未分组的表)。这可能就是为什么您不再能够重现问题的原因,并且这意味着您不再需要在个计数之后执行 ungroup()( )






请注意,您可能仍需要执行 ungroup () group_by() summarize()之后:

  mtcars%>%
group_by(cyl,wt)%>%
summary(n = n())

返回仍由 cyl 分组的小标题:

 #动作:30 x 3 
#组:cyl [?]
cyl wt n
< dbl> < dbl> < int>
1 4 1.51 1
2 4 1.62 1
3 4 1.84 1
4 4 1.94 1
5 4 2.14 1
6 4 2.2 1
7 4 2.32 1
8 4 2.46 1
9 4 2.78 1
10 4 3.15 1
#...还有20多行


When I first started programming in R I would often use dplyr count().

library(tidyverse)    
mtcars %>% count(cyl)

Once I started using apply functions I started running into issues with count(). If I simply added ungroup() to the end of my count()'s the problems would go away.

I don't have any particular reproducibles to show. But can somebody explain what the issue likely was, why ungroup() always fixed it, and are there any drawbacks to consistently using ungroup() after every count(), or after any group_by()? Of course I'm assuming I no longer need the data grouped after it's counted or summarized.

mtcars %>% count(cyl) %>% ungroup()

解决方案

The issues you used to run into were from an old behavior of count(). Up to dplyr 0.5.0, if you did:

mtcars %>%
  count(cyl, wt)

The result would still be grouped by the cyl column. This means, for example, that if you followed it with something like summarize(mean(am)), you would have gotten one row for each cyl when you may have expected one row overall. The issue would be fixed if you put %>% ungroup() after the count.

This behavior was changed in dplyr 0.7.0 (released in June 2017), such that count() preserves the grouping of its input (meaning mtcars %>% count(wt, cyl) now returns an ungrouped table). This is likely why you're no longer able to reproduce the problems, and it means you no longer need to do ungroup() after a count().


Note that you may still need to do ungroup() after a group_by() and summarize():

mtcars %>%
  group_by(cyl, wt) %>%
  summarize(n = n())

returns a tibble still grouped by cyl:

# A tibble: 30 x 3
# Groups:   cyl [?]
     cyl    wt     n
   <dbl> <dbl> <int>
 1     4  1.51     1
 2     4  1.62     1
 3     4  1.84     1
 4     4  1.94     1
 5     4  2.14     1
 6     4  2.2      1
 7     4  2.32     1
 8     4  2.46     1
 9     4  2.78     1
10     4  3.15     1
# ... with 20 more rows

这篇关于在R dplyr中,为什么我在count()之后需要ungroup()?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆