dplyr group_by和cummean函数 [英] dplyr group_by and cummean functions

查看:80
本文介绍了dplyr group_by和cummean函数的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我希望下面的代码输出一个包含三行的数据帧,每行代表计算每组 cyl 的均值之后的mpg累积平均值:

I expected the code below to output a data frame with three rows, each row representing the cumulative mean value of mpg after calculating the mean for each group of cyl:

library(dplyr)
mtcars %>%
arrange(cyl) %>%
group_by(cyl) %>%
summarise(running.mean.mpg = cummean(mpg))

这是我期望发生的事情:

This is what I expected to happen:

mean_cyl_4 <- mtcars %>% 
filter(cyl == 4) %>%
summarise(mean(mpg))

mean_cyl_4_6 <- mtcars %>% 
filter(cyl == 4 | cyl == 6) %>%
summarise(mean(mpg))

mean_cyl_4_6_8 <- mtcars %>% 
filter(cyl == 4 | cyl == 6 | cyl == 8) %>%
summarise(mean(mpg))

data.frame(cyl = c(4,6,8), running.mean.mpg = c(mean_cyl_4[1,1], mean_cyl_4_6[1,1], mean_cyl_4_6_8[1,1]))

  cyl running.mean.mpg
1   4     26.66364
2   6     23.97222
3   8     20.09062

为什么 dplyr 似乎忽略了 group_by(cyl)?

推荐答案

require("dplyr")

mtcars %>%
  arrange(cyl) %>%
  group_by(cyl) %>%
  mutate(running.mean.mpg = cummean(mpg)) %>%
  select(cyl, running.mean.mpg)

# Source: local data frame [32 x 2]
# Groups: cyl
# 
# # cyl running.mean.mpg
# # 1    4         22.80000
# # 2    4         23.60000
# # 3    4         23.33333
# # 4    4         25.60000
# # 5    4         26.56000
# # 6    4         27.78333
# # 7    4         26.88571
# # 8    4         26.93750

为了进行实验,这也可以与 data.table 一起使用.我的意思是,您还必须加载dplyr才能使用 cummean().

For the sake of experimentation, this would also work with data.table. I mean, you have to load dplyr also to have cummean() available.

require("data.table")
DT <- as.data.table(mtcars)
DT[,j=list(
  running.mean.mpg = cummean(mpg)
  ), by="cyl"]

这篇关于dplyr group_by和cummean函数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆