如何使用R和dplyr连接来自不同SQL数据库的表? [英] How to join tables from different SQL databases using R and dplyr?

查看:56
本文介绍了如何使用R和dplyr连接来自不同SQL数据库的表?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用 dplyr(0.7.0) dbplyr(1.0.0) DBI 0.6-1 odbc(1.0.1.9000).我想做类似以下的事情:

I'm using dplyr (0.7.0), dbplyr (1.0.0), DBI 0.6-1, and odbc (1.0.1.9000). I would like to do something like the following:

db1 <- DBI::dbConnect(
  odbc::odbc(),
  Driver = "SQL Server",
  Server = "MyServer",
  Database = "DB1"
)
db2 <- DBI::dbConnect(
  odbc::odbc(),
  Driver = "SQL Server",
  Server = "MyServer",
  Database = "DB2"
)
x <- tbl(db1, "Table1") %>%
  dplyr::left_join(tbl(db2, "Table2"), by = "JoinColumn") 

但是我不断收到一个错误,实际上似乎没有任何实质意义.当我使用 show_query 时,代码似乎试图创建一个将两个表连接在一起的SQL查询,而不考虑单独的数据库.根据 dplyr :: left_join 的文档,我也尝试过:

but I keep getting an error that doesn't really seem to have any substance to it. When I use show_query it seems like the code is trying to create a SQL query that joins the two tables without taking the separate databases into account. Per the documentation for dplyr::left_join I've also tried:

x <- tbl(db1, "Table1") %>%
      dplyr::left_join(tbl(db2, "Table2"), by = "JoinColumn", copy = TRUE) 

但是输出或错误消息没有变化.有没有其他方法可以从同一服务器上的不同数据库联接表?

But there is no change in the output or error message. Is there a different way to join tables from separate databases on the same server?

推荐答案

我从您提供的代码中假设(a)您有兴趣通过 tbl 对象> dplyr 的语法之前,您运行 collect()并将结果拉入本地内存,并且(b)您想直接引用数据库对象在对 tbl()的调用中.

I'm assuming from the code you provided that (a) you're interested in joining the two tbl objects via dplyr's syntax before you run collect() and pull the results into local memory and that (b) you want to refer directly to the database objects in the call to tbl().

如果您想利用 dplyr 以编程方式构建查询逻辑,同时又利用数据库服务器对大量数据进行INNER JOIN,则这些选择非常重要.(或者至少这就是为什么我在这里结束的原因.)

These choices are important if you want to leverage dplyr to programmatically build your query logic while simultaneously leveraging the database server to INNER JOIN large volumes of data down to the set that you're interested in. (Or at least that's why I ended up here.)

我发现的解决方案使用一个连接而不指定数据库,并使用 in_schema()拼出数据库和架构信息(我在任何地方都找不到此文档或渐晕的内容):

The solution I found uses one connection without specifying the database, and spells out the database and schema information using in_schema() (I couldn't find this documented or vignetted anywhere):

conn <- DBI::dbConnect(
  odbc::odbc(),
  Driver = "SQL Server",
  Server = "MyServer"
)

x <- tbl(src_dbi(conn),
         in_schema("DB1.dbo", "Table1")) %>%
  dplyr::left_join(tbl(src_dbi(conn),
                       in_schema("DB1.dbo", "Table2")),
                   by = "JoinColumn")

这篇关于如何使用R和dplyr连接来自不同SQL数据库的表?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆