使用dbplyr跨数据库联接 [英] Joining across databases with dbplyr

查看:932
本文介绍了使用dbplyr跨数据库联接的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用dbplyr处理数据库表

I am working with database tables with dbplyr

我有一个本地表,并希望将其与数据库上的大表(150m行)连接起来

I have a local table and want to join it with a large (150m rows) table on the database

数据库生产是只读的

# Set up the connection and point to the table

library(odbc); library(dbplyr)    

my_conn_string <- paste("Driver={Teradata};DBCName=teradata2690;DATABASE=PRODUCTION;UID=",
                            t2690_username,";PWD=",t2690_password, sep="")

t2690 <- dbConnect(odbc::odbc(), .connection_string=my_conn_string)

order_line <- tbl(t2690, "order_line") #150m rows

我也有一个本地表,我们称其为订单

I also have a local table, let's call it orders

# fill df with random data

orders <- data.frame(matrix(rexp(50), nrow = 100000, ncol = 5))

names(orders) <- c("customer_id", paste0(rep("variable_", 4), 1:4))

我想加入这两个表,但出现以下错误:

let's say I wanted to join these two tables, I get the following error:

complete_orders <- orders %>% left_join(order_line)

> Error: `x` and `y` must share the same src, set `copy` = TRUE (may be slow)

问题是,如果我要设置 copy = TRUE ,它将尝试下载整个 order_line ,我的计算机很快就会用完内存

The issue is, if I were to set copy = TRUE, it would try to download the whole of order_line and my computer would quickly run out of memory

另一种选择是将 orders 表上传到数据库。这里的问题是 PRODUCTION 数据库是只读的-我必须上载到其他数据库。尝试在dbplyr中跨数据库复制会导致相同的错误。

Another option could be to upload the orders table to the database. The issue here is that the PRODUCTION database is read only - I would have to upload to a different database. Trying to copy across databases in dbplyr results in the same error.

我发现的唯一解决方案是将其上传到可写数据库中并使用sql将它们连接起来,这是

The only solution I have found is to upload into the writable database and use sql to join them, which is far from ideal

推荐答案

我找到了答案,可以使用 in_schema()函数可跨同一连接内的模式进行工作

I have found the answer, you can use the in_schema() function within the tbl pointer to work across schemas within the same connection

# Connect without specifying a database
my_conn_string <- paste("Driver={Teradata};DBCName=teradata2690;UID=",
                            t2690_username,";PWD=",t2690_password, sep="")    

# Upload the local table to the TEMP db then point to it
orders <- tbl(t2690, in_schema("TEMP", "orders"))

order_line <-  tbl(t2690, in_schema("PRODUCTION", "order_line"))

complete_orders <- orders %>% left_join(order_line)

这篇关于使用dbplyr跨数据库联接的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆