组“加权"具有多个分组变量并排除自己的组值的平均值 [英] group "weighted" mean with multiple grouping variables and excluding own group value

查看:25
本文介绍了组“加权"具有多个分组变量并排除自己的组值的平均值的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试使用多个分组变量并排除自己的组值来获得组加权"平均值.这与我之前的帖子有关 Get分组均值具有多个分组变量并排除自己的组值,但是当我将其应用于我的实际问题(即获得加权均值)时,我发现它比获得简单均值要复杂得多.这就是我的意思.

I'm trying to get group "weighted" mean with multiple grouping variables and excluding own group value. This is related to my earlier post Get group mean with multiple grouping variables and excluding own group value, but when I applied it to my actual question (which is getting the weighted mean) I found out that it's much more complicated than getting the simple mean. Here's what I mean by that.

df <- data_frame(
  state = rep(c("AL", "CA"), each = 6),
  county = rep(letters[1:6], each = 2),
  year = rep(c(2011:2012), 6),
  value = c(91,46,37,80,33,97,4,19,85,90,56,94),
  wt = c(1,4,3,5,1,4,5,1,5,5,4,1)
) %>% arrange(state, year)

对于未加权的平均情况,以下代码(来自我之前帖子的已接受答案)应该可以工作.

For unweighted mean case, the following code (from the accepted answer of my earlier post) should work.

df %>%
  group_by(state, year) %>%
  mutate(q = (sum(value) - value) / (n()-1))

所需的变量 new_val,即加权平均值,如下所示.例如,new_val 列的前两行计算为 37*3/4 + 33*1/4 = 36, 91*1/2 + 33*1/2 = 62.

The desired variable new_val, which is the weighted mean, would be the following. For instance, the first two rows of new_val column are calculated as 37*3/4 + 33*1/4 = 36, 91*1/2 + 33*1/2 = 62.

# A tibble: 12 x 6
   state county  year value    wt new_val
   <chr> <chr>  <int> <dbl> <dbl>   <dbl>
 1 AL    a       2011    91     1    36  
 2 AL    b       2011    37     3    62
 3 AL    c       2011    33     1    50.5
 4 AL    a       2012    46     4    87.6  
 5 AL    b       2012    80     5    71.5
 6 AL    c       2012    97     4    64.9
 7 CA    d       2011     4     5    72.1
 8 CA    e       2011    85     5    27.1
 9 CA    f       2011    56     4    44.5
10 CA    d       2012    19     1    90.7
11 CA    e       2012    90     5    56.5
12 CA    f       2012    94     1    78.2

我搜索了考虑加权均值的类似帖子,但所有可用的帖子都针对简单的均值情况.任何意见将不胜感激.谢谢!

I searched for similar posts with weighted mean in mind, but all the available ones were for the simple mean cases. Any comments would be greatly appreciated. Thank you!

推荐答案

我们可以在weighted.mean

library(dplyr)

df %>%
  group_by(state, year) %>%
  mutate(new_val = purrr::map_dbl(row_number(), 
                         ~weighted.mean(value[-.x], wt[-.x])))


#   state county  year value    wt new_val
#   <chr> <chr>  <int> <dbl> <dbl>   <dbl>
# 1 AL    a       2011    91     1    36  
# 2 AL    b       2011    37     3    62  
# 3 AL    c       2011    33     1    50.5
# 4 AL    a       2012    46     4    87.6
# 5 AL    b       2012    80     5    71.5
# 6 AL    c       2012    97     4    64.9
# 7 CA    d       2011     4     5    72.1
# 8 CA    e       2011    85     5    27.1
# 9 CA    f       2011    56     4    44.5
#10 CA    d       2012    19     1    90.7
#11 CA    e       2012    90     5    56.5
#12 CA    f       2012    94     1    78.2

这篇关于组“加权"具有多个分组变量并排除自己的组值的平均值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆