获取具有多个分组变量并排除自己的分组值的分组均值 [英] Get group mean with multiple grouping variables and excluding own group value
问题描述
我正在寻找一种更快的方法来计算具有多个分组变量的分组平均值,同时排除自己的分组值.一项思想实验将是在同年从同一州的县中找到一个县的平均值(例如价格),而不包括自己县的价值.这是一个玩具数据集.
I'm looking for a faster way to calculate a group mean with multiple grouping variables while excluding own group values. A thought experiment would be finding average value (e.g. price) for a county from the counties in the same state in the same year excluding own county's value. Here's a toy data set.
df <- data_frame(
state = rep(c("AL", "CA"), each = 6),
county = rep(letters[1:6], each = 2),
year = rep(c(2011:2012), 6),
value = sample.int(100, 12)
)
df %>%
group_by(state, county, year) %>%
summarise(q = mean(df$value[df$state == state & df$county != county & df$year == year]))
# Groups: state, county [6]
state county year q
<chr> <chr> <int> <dbl>
1 AL a 2011 56
2 AL a 2012 46
3 AL b 2011 50.5
4 AL b 2012 52
5 AL c 2011 55.5
6 AL c 2012 29
7 CA d 2011 52.5
8 CA d 2012 32
9 CA e 2011 68.5
10 CA e 2012 31.5
11 CA f 2011 32
12 CA f 2012 42.5
上面的代码给了我想要的结果,但是当我将它应用于更大的数据集(具有更多的分组变量)时,它的速度确实变慢了.您对如何加快速度有任何建议吗?
The above code gives me a desired result, but when I apply this to a larger dataset (with more grouping variables) it gets really slow. Do you have any suggestion on how to speed this up?
如果原始方法不正确,请也指出这一点.
If the original approach is incorrect, please point that out as well.
推荐答案
library(dplyr)
df %>%
group_by(state, year) %>%
mutate(q = (sum(value) - value) / (n()-1))
#> # A tibble: 12 x 5
#> # Groups: state, year [4]
#> state county year value q
#> <chr> <chr> <int> <int> <dbl>
#> 1 AL a 2011 68 30.5
#> 2 AL a 2012 63 42
#> 3 AL b 2011 53 38
#> 4 AL b 2012 56 45.5
#> 5 AL c 2011 8 60.5
#> 6 AL c 2012 28 59.5
#> 7 CA d 2011 7 40
#> 8 CA d 2012 69 41
#> 9 CA e 2011 39 24
#> 10 CA e 2012 79 36
#> 11 CA f 2011 41 23
#> 12 CA f 2012 3 74
数据:
Data:
#data_frame is deprecate!
df <- tibble(
state = rep(c("AL", "CA"), each = 6),
county = rep(letters[1:6], each = 2),
year = rep(c(2011:2012), 6),
value = sample.int(100, 12)
)
这篇关于获取具有多个分组变量并排除自己的分组值的分组均值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!