将filter_all（any_vars（））转换为filter（across（）） [英] Translating filter_all(any_vars()) to filter(across())

查看：99 发布时间：2020/10/26 4:36:47 r dplyr

本文介绍了将filter_all（any_vars（））转换为filter（across（））的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

在更新我对另一个线程的答案时，我无法提出一个好的解决方案来替换最后一个示例（请参见下文）。想法是获取所有 any 列包含特定字符串的行，在我的示例中为 V。

On updating my own answer to another thread, I wasn't able to come up with a good solution to replace the last example (see below). The idea is to get all rows where any column contains a certain string, in my example "V".

library(tidyverse)

#get all rows where any column contains 'V'
diamonds %>%
  filter_all(any_vars(grepl('V',.))) %>%
  head
#> # A tibble: 6 x 10
#>   carat cut       color clarity depth table price     x     y     z
#>   <dbl> <ord>     <ord> <ord>   <dbl> <dbl> <int> <dbl> <dbl> <dbl>
#> 1 0.23  Good      E     VS1      56.9    65   327  4.05  4.07  2.31
#> 2 0.290 Premium   I     VS2      62.4    58   334  4.2   4.23  2.63
#> 3 0.24  Very Good J     VVS2     62.8    57   336  3.94  3.96  2.48
#> 4 0.24  Very Good I     VVS1     62.3    57   336  3.95  3.98  2.47
#> 5 0.26  Very Good H     SI1      61.9    55   337  4.07  4.11  2.53
#> 6 0.22  Fair      E     VS2      65.1    61   337  3.87  3.78  2.49


# this does naturally not give the desired output! 
diamonds %>%
  filter(across(everything(), ~ grepl('V', .))) %>%
  head
#> # A tibble: 0 x 10

我发现了一个海报思考类似的东西，

### don't run, this is ugly and does not work
diamonds %>%
  rowwise %>%
  filter(any(grepl("V", across(everything())))) %>%
  head

推荐答案

这很困难，因为该示例表明您要过滤所有列中的个条件满足时（即，您想要一个 union ），这些列中的数据。这是通过 filter_all（）和 any_vars（）完成的。

This is very difficult, because the example shows that you want to filter data from all columns when any of them meets the condition (i.e. you want a union). That's done with filter_all() and any_vars().

而 filter（across（everything（），...））当所有 all 满足条件时，就会从 all 列中滤除条件（即这是一个 intersection ，与前一个截然相反）。

While filter(across(everything(), ...)) filters out from all columns when all of them meet the condition (i.e. this is a intersection, quite opposite of the previous).

要将其从 intersection 转换为 union （例如，要再次获得列中 any 满足条件的行），您可能需要检查以下行的总和：

To convert it from intersection to the union (i.e. to get again rows where any of the columns meet the condition), you probably need to check the row sum for that:

diamonds %>%
   filter(rowSums(across(everything(), ~grepl("V", .x))) > 0)

它将对出现在行中的所有 TRUE 求和，即至少一个满足条件的值，该行的总和将为> 0 并显示出来。

It will sum all the TRUEs that appear in the row, i.e. if there is at least one value meeting the condition, that row sum will be > 0 and will be shown.

对不起 across（）并不是第一次 filter（）的子元素，但这至少是一些想法。：-）

I'm sorry for across() is not the very first child of filter(), but it's at least some idea how to do that. :-)

评估：

使用@TimTeaFan的方法来检查：

Using @TimTeaFan's method to check that:

 identical(
     {diamonds %>%
         filter_all(any_vars(grepl('V',.)))
     }, 
     {diamonds %>%
         filter(rowSums(across(everything(), ~grepl("V", .x))) > 0)
     }
 )
 #> [1] TRUE

基准：

As根据我们在TimTeaFan的回答下进行的讨论，这是一个比较，令人惊讶的是，所有解决方案的时间都相似：

As per our discussion under TimTeaFan's answer, here is a comparison, surprisingly, all solutions have a similar time:

library(tidyverse)
microbenchmark::microbenchmark(
  filter_all = {diamonds %>%
      filter_all(any_vars(grepl('V',.)))}, 
  purrr_reduce = {diamonds %>%
      filter(across(everything(), ~ grepl('V', .)) %>% purrr::reduce(`|`))},
  base_reduce = {diamonds %>%
      filter(across(everything(), ~ grepl('V', .)) %>% Reduce(`|`, .))},
  rowsums = {diamonds %>%
      filter(rowSums(across(everything(), ~grepl("V", .x))) > 0)},
  times = 100L,
  check = "identical"
)
#> Unit: milliseconds
#>          expr      min       lq     mean   median       uq      max neval
#>    filter_all 295.7235 302.1311 309.6455 305.0491 310.0335 449.3619   100
#>  purrr_reduce 297.8220 302.4411 310.2829 306.2929 312.2278 461.0194   100
#>   base_reduce 298.5033 303.6170 309.4147 306.1839 312.3518 409.5273   100
#>       rowsums 295.3863 301.0281 307.8517 305.3142 309.4793 372.8867   100

^{由 reprex包（v0.3.0）}

^{Created on 2020-07-14 by the reprex package (v0.3.0)}

这篇关于将filter_all（any_vars（））转换为filter（across（））的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

将filter_all（any_vars（））转换为filter（across（）） [英] Translating filter_all(any_vars()) to filter(across())

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

将filter_all（any_vars（））转换为filter（across（）） [英] Translating filter_all(any_vars()) to filter(across())

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭