总结基于行的过滤器以进行多种操作 [英] Summarise with row-based filter for multiple operations
问题描述
我有一个 df
,应根据单个列的条目(cond3
):
I have a df
that should be used to compute some ratios (divisions) following a group_by
for multiple entries abased on the entries of a single column (cond3
):
cond1 cond2 cond3 value
foo oof A 1
foo oof B 2
foo oof D 3
foo bar A 1
foo bar B 2
foo bar C 4
foo bar D 4
buz oof A 2
buz oof C 1
buz oof B 3
bar rab C 3
bar rab B 4
bar rab D 2
我可以通过一次选择并按以下方法将其除以完成此操作:
I can accomplish this this for a single selection and dividing by another as follows:
df %>% group_by(cond1, cond2) %>%
summarise(ratio = value[cond3 == "A"] / value[cond3 == "B"])
现在,假设我有两个列表,例如:
Now, lets assume I have two lists like:
list1 <- c("A","C")
list2 <- c("B","D")
,我想对多种组合进行除法.可以像这样明确地做到这一点:
and I want to perform the division for multiple combinations. This can be done explicitly like this:
df %>% group_by(cond1, cond2) %>%
summarise(ratio_AB = value[cond3 == "A"] / value[cond3 == "B"],
ratio_AD = value[cond3 == "A"] / value[cond3 == "D"],
ratio_CB = value[cond3 == "C"] / value[cond3 == "B"],
ratio_CD = value[cond3 == "C"] / value[cond3 == "D"])
我想像一个伪循环那样隐式地完成此操作:
I would like to have this implicitly done like a pseudo-loop:
df %>% group_by(cond1, cond2) %>%
summarise(ratios = value[cond3 %in% list1] / value[cond3 %in% list2])
没有平均值的预期输出:
The expected output without the average:
cond1 cond2 ratio_AB ratio_AD ratio_CB ratio_CD
1 foo oof 0.5 0.67 NA NA
2 foo bar 0.5 0.25 2 1
3 buz oof 0.67 NA 0.67 NA
4 bar rab NA NA 0.75 1.5
注意:这是基于我的示例.完整的df将包含所有四个条件( A,B,C,D
),因此不需要 NA
值.
NOTE: This is based on my example. The full df, will contain all four conditions (A,B,C,D
) and thus no NA
values are expected.
后者显然不起作用.如果要避免嵌套摘要操作的循环,该如何处理?
The latter obviously does not work. If I want to avoid a loop that nests the summarise operation, how would I go about this?
推荐答案
我们使用 crossing
从'list1','list2'创建组合数据集,并使用 pmap
循环遍历各行,然后在基于组合数据集对值"进行子集设置之后创建"ratio_"列,从而对原始数据集"df"进行分组,例如按"cond1","cond2", summary
在'cond3'上进行 reduce
还原为单个数据集,并使用 full_join
We create a combinations datasets from 'list1', 'list2' with crossing
, use pmap
to loop over the rows, do the grouping on the original dataset 'df' by 'cond1', 'cond2', summarise
by creating the 'ratio_' column after subsetting the 'value' based on the combination dataset on the 'cond3' and reduce
them to a single dataset with full_join
library(dplyr)
library(tidyr)
library(purrr)
library(stringr)
crossing(list1, list2) %>%
pmap(~ df %>%
group_by(cond1, cond2) %>%
summarise(!! str_c('ratio_', ..1, ..2) :=
value[cond3 == ..1]/value[cond3 == ..2], .groups = 'drop')) %>%
reduce(full_join, by = c('cond1', 'cond2'))
-输出
# A tibble: 4 x 6
# cond1 cond2 ratio_AB ratio_AD ratio_CB ratio_CD
# <chr> <chr> <dbl> <dbl> <dbl> <dbl>
#1 buz oof 0.667 NA 0.333 NA
#2 foo bar 0.5 0.25 2 1
#3 foo oof 0.5 0.333 NA NA
#4 bar rab NA NA 0.75 1.5
赋值运算符的lhs上的 !!
用于评估使用 str_c
创建的字符串,并将其分配为列名.通常,当我们使用 =
进行赋值时,lhs将是未引用的列名,并且在 base R
中,我们将 setNames
与粘贴
以创建新的列名称
The !!
on the lhs of assignment operator is to evaluate the string created with str_c
to be assigned as the column name. Usually, when we do the assignment with =
, the lhs will be unquoted column name and in base R
, we use setNames
with paste
to make new column names
这篇关于总结基于行的过滤器以进行多种操作的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!