为什么总结分组数据只能在dplyr中进行总体总结? [英] Why does summarise on grouped data result in only overall summary in dplyr?
问题描述
假设我有以下数据:
dfx< - data.frame(
group = c rep('A',8),rep('B',15),rep('C',6)),
sex = sample(c(M,F),size = 29 ,replace = TRUE),
age = runif(n = 29,min = 18,max = 54)
)
使用旧的 plyr
我可以创建一个小表,总结我的数据,使用以下代码:
require(plyr)
ddply(dfx,。(group,sex),summary,
mean = round(mean(age) 2),
sd = round(sd(age),2))
输出看起来像这样:
group sex mean sd
1 AF 49.68 5.68
2 AM 32.21 6.27
3 BF 31.87 9.80
4 BM 37.54 9.73
5 CF 40.61 15.21
6 CM 36.33 11.33
我试图将我的代码移动到 dplyr
和%>%
运算符。我的代码需要DF然后按组和性别进行分组,然后总结一下。那就是:
dfx%>%group_by(group,sex)%>%
summarize(mean =圆(平均(年龄),2),sd =圆(sd(年龄),2))
< >但我的输出是:
意思是sd
1 35.56 9.92
我做错了什么?
谢谢!
这里的问题是您正在加载dplyr,然后plyr,所以plyr的功能总结
正在屏蔽dplyr的功能总结
。发生这种情况时,您会收到以下警告:
require(plyr)
载入所需的包:plyr
- -------------------------------------------------- ---------------------------------------
你在dplyr之后加载了plyr - 这可能会导致问题。
如果您需要plyr和dplyr的函数,请先加载plyr,然后dplyr:
library(plyr);图书馆(dplyr)
------------------------------------------ ------------------------------------------------
附加包:'plyr'
以下对象从package:dplyr中被屏蔽:
arrange,desc,failwith,id,mutate,总结,总结
所以为了让你的代码工作,要么分离plyr 或者重新启动R并首先加载plyr然后dplyr(或只加载dplyr):
library(dplyr)
/ pre>
dfx%>%group_by(group,sex)%>%
summaryize(mean = round(mean(age),2),sd = round (年龄),2))
来源:本地数据框架[6 x 4]
组:组
组别意味着sd
1 AF 41.51 8.24
2 AM 32.23 11.85
3 BF 38.79 11.93
4 BM 31.00 7.92
5 CF 24.97 7.46
6 CM 36.17 9.11
或者你可以在你的公司中显式调用dplyr的总结de,所以正确的功能将被调用,无论你如何加载包:
dfx%>%group_by(group,性别)%>%
dplyr :: summarize(mean = round(mean(age),2),sd = round(sd(age),2))
Suppose I have the following data:
dfx <- data.frame( group = c(rep('A', 8), rep('B', 15), rep('C', 6)), sex = sample(c("M", "F"), size = 29, replace = TRUE), age = runif(n = 29, min = 18, max = 54) )
With the good old
plyr
I can create a little table summarizing my data with the following code:require(plyr) ddply(dfx, .(group, sex), summarize, mean = round(mean(age), 2), sd = round(sd(age), 2))
The output look like this:
group sex mean sd 1 A F 49.68 5.68 2 A M 32.21 6.27 3 B F 31.87 9.80 4 B M 37.54 9.73 5 C F 40.61 15.21 6 C M 36.33 11.33
I'm trying to move my code to
dplyr
and the%>%
operator. My code takes DF then group it by group and sex and then summarise it. That is:dfx %>% group_by(group, sex) %>% summarise(mean = round(mean(age), 2), sd = round(sd(age), 2))
But my output is:
mean sd 1 35.56 9.92
What am I doing wrong?
Thanks!
解决方案The problem here is that you are loading dplyr first and then plyr, so plyr's function
summarise
is masking dplyr's functionsummarise
. When that happens you get this warning:require(plyr) Loading required package: plyr ------------------------------------------------------------------------------------------ You have loaded plyr after dplyr - this is likely to cause problems. If you need functions from both plyr and dplyr, please load plyr first, then dplyr: library(plyr); library(dplyr) ------------------------------------------------------------------------------------------ Attaching package: ‘plyr’ The following objects are masked from ‘package:dplyr’: arrange, desc, failwith, id, mutate, summarise, summarize
So in order for your code to work, either detach plyr
detach(package:plyr)
or restart R and load plyr first and then dplyr (or load only dplyr):library(dplyr) dfx %>% group_by(group, sex) %>% summarise(mean = round(mean(age), 2), sd = round(sd(age), 2)) Source: local data frame [6 x 4] Groups: group group sex mean sd 1 A F 41.51 8.24 2 A M 32.23 11.85 3 B F 38.79 11.93 4 B M 31.00 7.92 5 C F 24.97 7.46 6 C M 36.17 9.11
Or you can explicitly call dplyr's summarise in your code, so the right function will be called no matter how you load the packages:
dfx %>% group_by(group, sex) %>% dplyr::summarise(mean = round(mean(age), 2), sd = round(sd(age), 2))
这篇关于为什么总结分组数据只能在dplyr中进行总体总结?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!