总结基于行的过滤器以进行多种操作 [英] Summarise with row-based filter for multiple operations

查看:44
本文介绍了总结基于行的过滤器以进行多种操作的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个 df ,应根据单个列的条目(cond3 ):

I have a df that should be used to compute some ratios (divisions) following a group_by for multiple entries abased on the entries of a single column (cond3):

cond1 cond2 cond3 value
foo   oof   A     1
foo   oof   B     2
foo   oof   D     3
foo   bar   A     1
foo   bar   B     2
foo   bar   C     4
foo   bar   D     4
buz   oof   A     2
buz   oof   C     1
buz   oof   B     3
bar   rab   C     3
bar   rab   B     4  
bar   rab   D     2 

我可以通过一次选择并按以下方法将其除以完成此操作:

I can accomplish this this for a single selection and dividing by another as follows:

df %>% group_by(cond1, cond2) %>% 
  summarise(ratio = value[cond3 == "A"] / value[cond3 == "B"])

现在,假设我有两个列表,例如:

Now, lets assume I have two lists like:

list1 <- c("A","C")
list2 <- c("B","D")

,我想对多种组合进行除法.可以像这样明确地做到这一点:

and I want to perform the division for multiple combinations. This can be done explicitly like this:

    df %>% group_by(cond1, cond2) %>% 
      summarise(ratio_AB = value[cond3 == "A"] / value[cond3 == "B"],
ratio_AD = value[cond3 == "A"] / value[cond3 == "D"],
ratio_CB = value[cond3 == "C"] / value[cond3 == "B"],
ratio_CD = value[cond3 == "C"] / value[cond3 == "D"])

我想像一个伪循环那样隐式地完成此操作:

I would like to have this implicitly done like a pseudo-loop:

df %>% group_by(cond1, cond2) %>% 
  summarise(ratios = value[cond3 %in% list1] / value[cond3 %in% list2])

没有平均值的预期输出:

The expected output without the average:

   cond1 cond2 ratio_AB ratio_AD ratio_CB ratio_CD
 1 foo   oof   0.5      0.67     NA       NA 
 2 foo   bar   0.5      0.25     2        1
 3 buz   oof   0.67     NA       0.67     NA 
 4 bar   rab   NA       NA       0.75     1.5 

注意:这是基于我的示例.完整的df将包含所有四个条件( A,B,C,D ),因此不需要 NA 值.

NOTE: This is based on my example. The full df, will contain all four conditions (A,B,C,D) and thus no NA values are expected.

后者显然不起作用.如果要避免嵌套摘要操作的循环,该如何处理?

The latter obviously does not work. If I want to avoid a loop that nests the summarise operation, how would I go about this?

推荐答案

我们使用 crossing 从'list1','list2'创建组合数据集,并使用 pmap 循环遍历各行,然后在基于组合数据集对值"进行子集设置之后创建"ratio_"列,从而对原始数据集"df"进行分组,例如按"cond1","cond2", summary 在'cond3'上进行 reduce 还原为单个数据集,并使用 full_join

We create a combinations datasets from 'list1', 'list2' with crossing, use pmap to loop over the rows, do the grouping on the original dataset 'df' by 'cond1', 'cond2', summarise by creating the 'ratio_' column after subsetting the 'value' based on the combination dataset on the 'cond3' and reduce them to a single dataset with full_join

library(dplyr)
library(tidyr)
library(purrr)
library(stringr)
crossing(list1, list2) %>%
   pmap(~ df %>% 
            group_by(cond1, cond2) %>%
            summarise(!! str_c('ratio_', ..1, ..2) :=
                 value[cond3 == ..1]/value[cond3 == ..2], .groups = 'drop')) %>% 
   reduce(full_join, by = c('cond1', 'cond2'))

-输出

# A tibble: 4 x 6
#  cond1 cond2 ratio_AB ratio_AD ratio_CB ratio_CD
#  <chr> <chr>    <dbl>    <dbl>    <dbl>    <dbl>
#1 buz   oof      0.667   NA        0.333     NA  
#2 foo   bar      0.5      0.25     2          1  
#3 foo   oof      0.5      0.333   NA         NA  
#4 bar   rab     NA       NA        0.75       1.5

赋值运算符的lhs上的 !! 用于评估使用 str_c 创建的字符串,并将其分配为列名.通常,当我们使用 = 进行赋值时,lhs将是未引用的列名,并且在 base R 中,我们将 setNames 粘贴以创建新的列名称

The !! on the lhs of assignment operator is to evaluate the string created with str_c to be assigned as the column name. Usually, when we do the assignment with =, the lhs will be unquoted column name and in base R, we use setNames with paste to make new column names

这篇关于总结基于行的过滤器以进行多种操作的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆