组“加权"具有多个分组变量并排除自己的组值的平均值 [英] group "weighted" mean with multiple grouping variables and excluding own group value
问题描述
我正在尝试使用多个分组变量并排除自己的组值来获得组加权"平均值.这与我之前的帖子有关 Get分组均值具有多个分组变量并排除自己的组值,但是当我将其应用于我的实际问题(即获得加权均值)时,我发现它比获得简单均值要复杂得多.这就是我的意思.
I'm trying to get group "weighted" mean with multiple grouping variables and excluding own group value. This is related to my earlier post Get group mean with multiple grouping variables and excluding own group value, but when I applied it to my actual question (which is getting the weighted mean) I found out that it's much more complicated than getting the simple mean. Here's what I mean by that.
df <- data_frame(
state = rep(c("AL", "CA"), each = 6),
county = rep(letters[1:6], each = 2),
year = rep(c(2011:2012), 6),
value = c(91,46,37,80,33,97,4,19,85,90,56,94),
wt = c(1,4,3,5,1,4,5,1,5,5,4,1)
) %>% arrange(state, year)
对于未加权的平均情况,以下代码(来自我之前帖子的已接受答案)应该可以工作.
For unweighted mean case, the following code (from the accepted answer of my earlier post) should work.
df %>%
group_by(state, year) %>%
mutate(q = (sum(value) - value) / (n()-1))
所需的变量 new_val,即加权平均值,如下所示.例如,new_val 列的前两行计算为 37*3/4 + 33*1/4 = 36, 91*1/2 + 33*1/2 = 62.
The desired variable new_val, which is the weighted mean, would be the following. For instance, the first two rows of new_val column are calculated as 37*3/4 + 33*1/4 = 36, 91*1/2 + 33*1/2 = 62.
# A tibble: 12 x 6
state county year value wt new_val
<chr> <chr> <int> <dbl> <dbl> <dbl>
1 AL a 2011 91 1 36
2 AL b 2011 37 3 62
3 AL c 2011 33 1 50.5
4 AL a 2012 46 4 87.6
5 AL b 2012 80 5 71.5
6 AL c 2012 97 4 64.9
7 CA d 2011 4 5 72.1
8 CA e 2011 85 5 27.1
9 CA f 2011 56 4 44.5
10 CA d 2012 19 1 90.7
11 CA e 2012 90 5 56.5
12 CA f 2012 94 1 78.2
我搜索了考虑加权均值的类似帖子,但所有可用的帖子都针对简单的均值情况.任何意见将不胜感激.谢谢!
I searched for similar posts with weighted mean in mind, but all the available ones were for the simple mean cases. Any comments would be greatly appreciated. Thank you!
推荐答案
我们可以在weighted.mean
library(dplyr)
df %>%
group_by(state, year) %>%
mutate(new_val = purrr::map_dbl(row_number(),
~weighted.mean(value[-.x], wt[-.x])))
# state county year value wt new_val
# <chr> <chr> <int> <dbl> <dbl> <dbl>
# 1 AL a 2011 91 1 36
# 2 AL b 2011 37 3 62
# 3 AL c 2011 33 1 50.5
# 4 AL a 2012 46 4 87.6
# 5 AL b 2012 80 5 71.5
# 6 AL c 2012 97 4 64.9
# 7 CA d 2011 4 5 72.1
# 8 CA e 2011 85 5 27.1
# 9 CA f 2011 56 4 44.5
#10 CA d 2012 19 1 90.7
#11 CA e 2012 90 5 56.5
#12 CA f 2012 94 1 78.2
这篇关于组“加权"具有多个分组变量并排除自己的组值的平均值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!