如何给dplyr一个SQL查询并让它返回一个远程tbl对象? [英] How to give dplyr a SQL query and have it return a remote tbl object?

查看:117
本文介绍了如何给dplyr一个SQL查询并让它返回一个远程tbl对象?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

说我使用dbplyr打开了一个远程tbl,并且我想在上面使用SQL查询(也许是因为我想做的事情没有dbplyr转换),我该如何给出它以便它返回一个远程tbl

Say I have a remote tbl open using dbplyr, and I want to use a SQL query on it (maybe because there's not dbplyr translation for what I want to do), how do I give it such that it returns a remote tbl object?

DBI :: dbGetQuery()函数可让您对db进行查询,但返回

The DBI::dbGetQuery() function allows you to give a query to db, but it returns a data frame on memory, and not an remote tbl object.

例如,假设您已经有一个连接 con 打开数据库,您可以创建一个像这样的表:

For example, say you already have a connection con open to a db, you can create a table like this:

library(tidyverse)

x_df <- expand.grid(A = c('a','b','c'), B = c('d','e','f', 'g','h')) %>% 
  mutate(C = round(rnorm(15), 2))

DBI::dbWriteTable(conn = con,
                  name = "x_tbl",
                  value = x_df,
                  overwrite = TRUE)

x_tbl = tbl(con, 'x_tbl')

sql_query <- build_sql('SELECT a, b, c, avg(c) OVER (PARTITION BY a) AS mean_c FROM x_tbl')

y_df <- DBI::dbGetQuery(con, sql_query) # This returns a data frame on memory

y_tbl <- x_tbl %>% 
  group_by(a) %>% 
  mutate(mean_c = mean(c))

show_query(y_tbl) # This is a remote tbl object

在这种情况下,我可以使用 y_tbl 。但是在某些情况下,该函数未在dbplyr上进行翻译(例如,分位数不起作用),我需要使用SQL代码。但我不想收集结果,而是希望它创建一个远程tbl对象。有没有一种方法可以给我一个SQL查询(例如 dbGetQuery()),但是它可以返回远程tbl?

In this case, I could just use y_tbl. But there are cases in which the function has not been translated on dbplyr (for example, quantile doesn't work), and I need to use SQL code. But I don't want to collect the result, I want it to create a remote tbl object. Is there a way I can give a SQL query (like with dbGetQuery()) but have it return a remote tbl?

谢谢

推荐答案

好吧,尝试一下它的工作原理,我想我找到了一种方法。您可以在mutate函数中进行sql查询:

Well, playing with how it works, I think I found a way. You can give a sql query inside the mutate function:

y_tbl <- x_tbl %>% 
  group_by(a) %>% 
  mutate(mean_c = sql("avg(c) OVER (PARTITION BY a)"))

show_query(y_tbl) # This is a remote tbl object

这将使您无需修改​​即可给出变量的SQL定义桌子也一样。

This will let you give a SQL definition of a variable without having to compute the table too.

这篇关于如何给dplyr一个SQL查询并让它返回一个远程tbl对象?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆