总结与“其他”组 [英] Summarize with dplyr "other then" groups

查看:118
本文介绍了总结与“其他”组的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我需要在一个分组的data_frame中进行总结(警告:一个解决方案非常感谢,但不是强制性的)每个组(简单的)和其他组上相同的东西。

I need to summarize in a grouped data_frame (warn: a solution with dplyr is very much appreciated but isn't mandatory) both something on each group (simple) and the same something on "other" groups.

最小化示例

if(!require(pacman)) install.packages(pacman)
pacman::p_load(dplyr)

df <- data_frame(
    group = c('a', 'a', 'b', 'b', 'c', 'c'),
    value = c(1, 2, 3, 4, 5, 6)
)

res <- df %>%
    group_by(group) %>%
    summarize(
        median        = median(value)
#        median_other  = ... ??? ... # I need the median of all "other"
                                     # groups
#        median_before = ... ??? ... # I need the median of groups (e.g
                                 #    the "before" in alphabetic order,
                                 #    but clearly every roule which is
                                 #    a "selection function" depending
                                 #    on the actual group is fine)
    )

我的预期结果是以下

group    median    median_other    median_before
  a        1.5         4.5               NA
  b        3.5         3.5               1.5
  c        5.5         2.5               2.5

我在谷歌搜索字符串类似dplyr总结排除组,dplyr总结其他组,我已经在dplyr文档中搜索,但是我无法找到解决方案。

I've searched on Google strings similar to "dplyr summarize excluding groups", "dplyr summarize other then group",I've searched on the dplyr documentation but I wasn't able to find a solution.

这里, a href =https://stackoverflow.com/questions/34327780/how-to-summarize-value-not-matching-the-group-using-dplyr>如何使用dplyr来总结不符合组的值)不适用,因为它仅以总和运行,即解决方案功能特定(并且使用简单的算术函数,没有考虑每个组的变异性)。更复杂的功能请求(即,平均值,sd或用户功能)呢? : - )

here, this (How to summarize value not matching the group using dplyr) does not apply because it runs only on sum, i.e. is a solution "function-specific" (and with a simple arithmetic function that did not consider the variability on each group). What about more complex function request (i.e. mean, sd, or user-function)? :-)

感谢所有

PS: summarize()是一个例子,同样的问题导致 mutate()或其他基于组工作的dplyr函数。

PS: summarize() is an example, the same question leads to mutate() or other dplyr-functions working based on groups.

推荐答案

这是我的解决方案:

res <- df %>%
  group_by(group) %>%
  summarise(med_group = median(value),
            med_other = (median(df$value[df$group != group]))) %>% 
  mutate(med_before = lag(med_group))

> res
Source: local data frame [3 x 4]

      group med_group med_other med_before
  (chr)     (dbl)     (dbl)      (dbl)
1     a       1.5       4.5         NA
2     b       3.5       3.5        1.5
3     c       5.5       2.5        3.5

I正在试图提出一个全面的解决方案,但是基本的R子集可以用 median(df $ value [df $ group!= group])返回中值所有观察结果不在当前组。

I was trying to come up with an all-dplyr solution but base R subsetting works just fine with median(df$value[df$group != group]) returning the median of all observations that are not in the current group.

我希望这有助于您解决问题。

I hope this help you to solve your problem.

这篇关于总结与“其他”组的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆