使用dplyr排除当前观察值,计算组平均值 [英] Calculate group mean while excluding current observation using dplyr

查看:181
本文介绍了使用dplyr排除当前观察值,计算组平均值的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

使用 dplyr (最好),我试图计算每次观察的组平均值,而不包括该组的观察值。

Using dplyr (preferably), I am trying to calculate the group mean for each observation while excluding that observation from the group.

似乎这应该是可以与 rowwise() group_by的组合(),但两个函数都不能同时使用。

It seems that this should be doable with a combination of rowwise() and group_by(), but both functions cannot be used simultaneously.

给定这个数据框:

df <- data_frame(grouping = rep(LETTERS[1:5], 3),
                 value = 1:15) %>%
  arrange(grouping)
df
#> Source: local data frame [15 x 2]
#> 
#>    grouping value
#>       (chr) (int)
#> 1         A     1
#> 2         A     6
#> 3         A    11
#> 4         B     2
#> 5         B     7
#> 6         B    12
#> 7         C     3
#> 8         C     8
#> 9         C    13
#> 10        D     4
#> 11        D     9
#> 12        D    14
#> 13        E     5
#> 14        E    10
#> 15        E    15

我想让每个观察结果的组意味着排除在导致:

I'd like to get the group mean for each observation with that observation excluded from the group, resulting in:

#>    grouping value special_mean
#>       (chr) (int)
#> 1         A     1          8.5  # i.e. (6 + 11) / 2
#> 2         A     6            6  # i.e. (1 + 11) / 2
#> 3         A    11          3.5  # i.e. (1 + 6) / 2
#> 4         B     2          9.5
#> 5         B     7            7
#> 6         B    12          4.5
#> 7         C     3          ...

我尝试嵌套 rowwise() do()调用的函数内,但没有得到它的工作,沿着这些行:

I've attempted nesting rowwise() inside a function called by do(), but haven't gotten it to work, along these lines:

special_avg <- function(chunk) {
  chunk %>%
    rowwise() #%>%
    # filter or something...?
}

df %>%
  group_by(grouping) %>%
  do(special_avg(.))


推荐答案

不需要定义一个自定义函数,而是可以简单地求和组中的所有元素,减去当前值,除以每组元素数减去 1

No need to define a custom function, instead we could simply sum all elements of the group, subtract the current value, and divide by number of elements per group minus 1.

df %>% group_by(grouping) %>%
        mutate(special_mean = (sum(value) - value)/(n()-1))
#   grouping value special_mean
#      (chr) (int)        (dbl)
#1         A     1          8.5
#2         A     6          6.0
#3         A    11          3.5
#4         B     2          9.5
#5         B     7          7.0

这篇关于使用dplyr排除当前观察值,计算组平均值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆