使用dbplyr跨数据库联接 [英] Joining across databases with dbplyr
问题描述
我正在使用dbplyr处理数据库表
I am working with database tables with dbplyr
我有一个本地表,并希望将其与数据库上的大表(150m行)连接起来
I have a local table and want to join it with a large (150m rows) table on the database
数据库生产
是只读的
# Set up the connection and point to the table
library(odbc); library(dbplyr)
my_conn_string <- paste("Driver={Teradata};DBCName=teradata2690;DATABASE=PRODUCTION;UID=",
t2690_username,";PWD=",t2690_password, sep="")
t2690 <- dbConnect(odbc::odbc(), .connection_string=my_conn_string)
order_line <- tbl(t2690, "order_line") #150m rows
我也有一个本地表,我们称其为订单
I also have a local table, let's call it orders
# fill df with random data
orders <- data.frame(matrix(rexp(50), nrow = 100000, ncol = 5))
names(orders) <- c("customer_id", paste0(rep("variable_", 4), 1:4))
我想加入这两个表,但出现以下错误:
let's say I wanted to join these two tables, I get the following error:
complete_orders <- orders %>% left_join(order_line)
> Error: `x` and `y` must share the same src, set `copy` = TRUE (may be slow)
问题是,如果我要设置 copy = TRUE
,它将尝试下载整个 order_line
,我的计算机很快就会用完内存
The issue is, if I were to set copy = TRUE
, it would try to download the whole of order_line
and my computer would quickly run out of memory
另一种选择是将 orders
表上传到数据库。这里的问题是 PRODUCTION
数据库是只读的-我必须上载到其他数据库。尝试在dbplyr中跨数据库复制会导致相同的错误。
Another option could be to upload the orders
table to the database. The issue here is that the PRODUCTION
database is read only - I would have to upload to a different database. Trying to copy across databases in dbplyr results in the same error.
我发现的唯一解决方案是将其上传到可写数据库中并使用sql将它们连接起来,这是
The only solution I have found is to upload into the writable database and use sql to join them, which is far from ideal
推荐答案
我找到了答案,可以使用 in_schema()$ tbl指针中的c $ c>函数可跨同一连接内的模式进行工作
I have found the answer, you can use the in_schema()
function within the tbl pointer to work across schemas within the same connection
# Connect without specifying a database
my_conn_string <- paste("Driver={Teradata};DBCName=teradata2690;UID=",
t2690_username,";PWD=",t2690_password, sep="")
# Upload the local table to the TEMP db then point to it
orders <- tbl(t2690, in_schema("TEMP", "orders"))
order_line <- tbl(t2690, in_schema("PRODUCTION", "order_line"))
complete_orders <- orders %>% left_join(order_line)
这篇关于使用dbplyr跨数据库联接的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!