为什么总结分组数据只能在dplyr中进行总体总结? [英] Why does summarise on grouped data result in only overall summary in dplyr?

查看:119
本文介绍了为什么总结分组数据只能在dplyr中进行总体总结?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

假设我有以下数据:

  dfx<  -  data.frame(
group = c rep('A',8),rep('B',15),rep('C',6)),
sex = sample(c(M,F),size = 29 ,replace = TRUE),
age = runif(n = 29,min = 18,max = 54)

使用旧的 plyr 我可以创建一个小表,总结我的数据,使用以下代码:

  require(plyr)
ddply(dfx,。(group,sex),summary,
mean = round(mean(age) 2),
sd = round(sd(age),2))

输出看起来像这样:

  group sex mean sd 
1 AF 49.68 5.68
2 AM 32.21 6.27
3 BF 31.87 9.80
4 BM 37.54 9.73
5 CF 40.61 15.21
6 CM 36.33 11.33

我试图将我的代码移动到 dplyr %>%运算符。我的代码需要DF然后按组和性别进行分组,然后总结一下。那就是:

  dfx%>%group_by(group,sex)%>%
summarize(mean =圆(平均(年龄),2),sd =圆(sd(年龄),2))



< >但我的输出是:

 意思是sd 
1 35.56 9.92

我做错了什么?



谢谢!

解决方案

这里的问题是您正在加载dplyr,然后plyr,所以plyr的功能总结正在屏蔽dplyr的功能总结。发生这种情况时,您会收到以下警告:

  require(plyr)
载入所需的包:plyr
- -------------------------------------------------- ---------------------------------------
你在dplyr之后加载了plyr - 这可能会导致问题。
如果您需要plyr和dplyr的函数,请先加载plyr,然后dplyr:
library(plyr);图书馆(dplyr)
------------------------------------------ ------------------------------------------------

附加包:'plyr'

以下对象从package:dplyr中被屏蔽:

arrange,desc,failwith,id,mutate,总结,总结

所以为了让你的代码工作,要么分离plyr 或者重新启动R并首先加载plyr然后dplyr(或只加载dplyr):

  library(dplyr)
dfx%>%group_by(group,sex)%>%
summaryize(mean = round(mean(age),2),sd = round (年龄),2))
来源:本地数据框架[6 x 4]
组:组

组别意味着sd
1 AF 41.51 8.24
2 AM 32.23 11.85
3 BF 38.79 11.93
4 BM 31.00 7.92
5 CF 24.97 7.46
6 CM 36.17 9.11
/ pre>

或者你可以在你的公司中显式调用dplyr的总结de,所以正确的功能将被调用,无论你如何加载包:

  dfx%>%group_by(group,性别)%>%
dplyr :: summarize(mean = round(mean(age),2),sd = round(sd(age),2))


Suppose I have the following data:

dfx <- data.frame(
  group = c(rep('A', 8), rep('B', 15), rep('C', 6)),
  sex = sample(c("M", "F"), size = 29, replace = TRUE),
  age = runif(n = 29, min = 18, max = 54)
)

With the good old plyr I can create a little table summarizing my data with the following code:

require(plyr)
ddply(dfx, .(group, sex), summarize,
      mean = round(mean(age), 2),
      sd = round(sd(age), 2))

The output look like this:

  group sex  mean    sd
1     A   F 49.68  5.68
2     A   M 32.21  6.27
3     B   F 31.87  9.80
4     B   M 37.54  9.73
5     C   F 40.61 15.21
6     C   M 36.33 11.33

I'm trying to move my code to dplyr and the %>% operator. My code takes DF then group it by group and sex and then summarise it. That is:

dfx %>% group_by(group, sex) %>% 
  summarise(mean = round(mean(age), 2), sd = round(sd(age), 2))

But my output is:

  mean   sd
1 35.56 9.92

What am I doing wrong?

Thanks!

解决方案

The problem here is that you are loading dplyr first and then plyr, so plyr's function summarise is masking dplyr's function summarise. When that happens you get this warning:

require(plyr)
    Loading required package: plyr
------------------------------------------------------------------------------------------
You have loaded plyr after dplyr - this is likely to cause problems.
If you need functions from both plyr and dplyr, please load plyr first, then dplyr:
library(plyr); library(dplyr)
------------------------------------------------------------------------------------------

Attaching package: ‘plyr’

The following objects are masked from ‘package:dplyr’:

    arrange, desc, failwith, id, mutate, summarise, summarize

So in order for your code to work, either detach plyr detach(package:plyr) or restart R and load plyr first and then dplyr (or load only dplyr):

library(dplyr)
dfx %>% group_by(group, sex) %>% 
  summarise(mean = round(mean(age), 2), sd = round(sd(age), 2))
Source: local data frame [6 x 4]
Groups: group

  group sex  mean    sd
1     A   F 41.51  8.24
2     A   M 32.23 11.85
3     B   F 38.79 11.93
4     B   M 31.00  7.92
5     C   F 24.97  7.46
6     C   M 36.17  9.11

Or you can explicitly call dplyr's summarise in your code, so the right function will be called no matter how you load the packages:

dfx %>% group_by(group, sex) %>% 
  dplyr::summarise(mean = round(mean(age), 2), sd = round(sd(age), 2))

这篇关于为什么总结分组数据只能在dplyr中进行总体总结?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆