总结与“其他”组 [英] Summarize with dplyr "other then" groups
问题描述
我需要在一个分组的data_frame中进行总结(警告:一个解决方案非常感谢,但不是强制性的)每个组(简单的)和其他组上相同的东西。
I need to summarize in a grouped data_frame (warn: a solution with dplyr is very much appreciated but isn't mandatory) both something on each group (simple) and the same something on "other" groups.
最小化示例
if(!require(pacman)) install.packages(pacman)
pacman::p_load(dplyr)
df <- data_frame(
group = c('a', 'a', 'b', 'b', 'c', 'c'),
value = c(1, 2, 3, 4, 5, 6)
)
res <- df %>%
group_by(group) %>%
summarize(
median = median(value)
# median_other = ... ??? ... # I need the median of all "other"
# groups
# median_before = ... ??? ... # I need the median of groups (e.g
# the "before" in alphabetic order,
# but clearly every roule which is
# a "selection function" depending
# on the actual group is fine)
)
我的预期结果是以下
group median median_other median_before
a 1.5 4.5 NA
b 3.5 3.5 1.5
c 5.5 2.5 2.5
我在谷歌搜索字符串类似dplyr总结排除组,dplyr总结其他组,我已经在dplyr文档中搜索,但是我无法找到解决方案。
I've searched on Google strings similar to "dplyr summarize excluding groups", "dplyr summarize other then group",I've searched on the dplyr documentation but I wasn't able to find a solution.
这里, a href =https://stackoverflow.com/questions/34327780/how-to-summarize-value-not-matching-the-group-using-dplyr>如何使用dplyr来总结不符合组的值)不适用,因为它仅以总和运行,即解决方案功能特定(并且使用简单的算术函数,没有考虑每个组的变异性)。更复杂的功能请求(即,平均值,sd或用户功能)呢? : - )
here, this (How to summarize value not matching the group using dplyr) does not apply because it runs only on sum, i.e. is a solution "function-specific" (and with a simple arithmetic function that did not consider the variability on each group). What about more complex function request (i.e. mean, sd, or user-function)? :-)
感谢所有
PS: summarize()
是一个例子,同样的问题导致 mutate()
或其他基于组工作的dplyr函数。
PS: summarize()
is an example, the same question leads to mutate()
or other dplyr-functions working based on groups.
推荐答案
这是我的解决方案:
res <- df %>%
group_by(group) %>%
summarise(med_group = median(value),
med_other = (median(df$value[df$group != group]))) %>%
mutate(med_before = lag(med_group))
> res
Source: local data frame [3 x 4]
group med_group med_other med_before
(chr) (dbl) (dbl) (dbl)
1 a 1.5 4.5 NA
2 b 3.5 3.5 1.5
3 c 5.5 2.5 3.5
I正在试图提出一个全面的解决方案,但是基本的R子集可以用 median(df $ value [df $ group!= group])
返回中值所有观察结果不在当前组。
I was trying to come up with an all-dplyr solution but base R subsetting works just fine with median(df$value[df$group != group])
returning the median of all observations that are not in the current group.
我希望这有助于您解决问题。
I hope this help you to solve your problem.
这篇关于总结与“其他”组的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!