在数据库的dplyr过滤器函数中传递SQL函数 [英] Pass SQL functions in dplyr filter function on database

查看:89
本文介绍了在数据库的dplyr过滤器函数中传递SQL函数的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用 dplyr 的自动SQL后端从数据库表中查询子表。例如,

I'm using dplyr's automatic SQL backend to query subtable from a database table. E.g.

my_tbl <- tbl(my_db, "my_table")

其中数据库中的 my_table 看起来像

where my_table in the database looks like

batch_name    value
batch_A_1     1
batch_A_2     2
batch_A_2     3
batch_B_1     8
batch_B_2     9
...

我只想要 batch_A _#中的数据,不论数量。

I just want the data from batch_A_#, regardless of the number.

如果我是用SQL编写的,则可以使用

If I were writing this in SQL, I could use

select * where batch_name like 'batch_A_%'

如果我在R中编写此代码,则可以使用几种获取方法: grepl()%in% str_detect()

If I were writing this in R, I could use a few ways to get this: grepl(), %in%, or str_detect()

# option 1
subtable <- my_tbl %>% select(batch_name, value) %>%
    filter(grepl('batch_A_', batch_name, fixed = T))
# option 2
subtable <- my_tbl %>% select(batch_name, value) %>%
    filter(str_detect(batch_name, 'batch_A_'))

所有这些都会产生以下Postgres错误:提示:没有函数与给定的名称和参数类型匹配。您可能需要添加显式类型强制转换

All of these give the following Postgres error: HINT: No function matches the given name and argument types. You might need to add explicit type casts

因此,如何传递SQL字符串函数或匹配函数以帮助生成生成的dplyr SQL查询能够在过滤器中使用更灵活的功能?

So, how do I pass in SQL string functions or matching functions to help make the generated dplyr SQL query able to use a more flexible range of functions in filter?

(仅供参考, %in%函数确实有效,但需要列出所有可能的值,可以将其与 paste 组合以创建列表,但不可以

(FYI the %in% function does work, but requires listing out all possible values. This would be okay combined with paste to make a list, but does not work in a more general regex case)

推荐答案




dplyr 的解决方案是

tbl(my_con, "my_table") %>% 
  filter(batch_name %like% "batch_A_%") %>% 
  collect()

完整 reprex

suppressPackageStartupMessages({
  library(dplyr)
  library(dbplyr)
  library(RPostgreSQL)
})

my_con <- 
  dbConnect(
    PostgreSQL(),
    user     = "my_user",
    password = "my_password",
    host     = "my_host",
    dbname   = "my_db"
  )

my_table <- tribble(
  ~batch_name,    ~value,
  "batch_A_1",     1,
  "batch_A_2",     2,
  "batch_A_2",     3,
  "batch_B_1",     8,
  "batch_B_2",     9
)

copy_to(my_con, my_table)

tbl(my_con, "my_table") %>% 
  filter(batch_name %like% "batch_A_%") %>% 
  collect()
#> # A tibble: 3 x 2
#>   batch_name value
#> *      <chr> <dbl>
#> 1  batch_A_1     1
#> 2  batch_A_2     2
#> 3  batch_A_2     3

dbDisconnect(my_con)
#> [1] TRUE

之所以有效,是因为dplyr的任何函数都不知道如何使用
翻译将按原样传递,请参见
?dbplyr ::

This works because any functions that dplyr doesn't know how to translate will be passed along as is, see ?dbplyr::translate\_sql.

帽子提示到 @ PaulRougieux 的最新评论
此处

Hat-tip to @PaulRougieux for his recent comment here

这篇关于在数据库的dplyr过滤器函数中传递SQL函数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆