dplyr sql连接 [英] dplyr sql joins
问题描述
考虑以下SQL:
SELECT D.product_name
FROM business.payment P
LEFT JOIN dim.product_name D
ON D.product_id = P.product_id
查询返回
如何在dplyr中复制这样的内容而不将其存入内存?我正在使用数据库连接。
How would something like this be replicated in dplyr without pulling into memory? I'm working with a database connection.
我尝试了以下操作,但无济于事:
I tried the following, but to no avail:
product_name <-
business %>%
tbl('dim_product')
business %>%
tbl('payment') %>%
left_join(product_name, by = 'product_id') %>%
select(product_name) %>%
collect()
我搜索了很多内容,似乎没有人解决这个问题。
I've searched quite a big and no one seems to have addressed this.
谢谢!
推荐答案
这已经过了一段时间,但也许您仍然在寻找或好奇 dplyr
基于动词的选项。我在处理同一问题时遇到了您(某种)未解决的问题。当我使用 DBI
和 odbc
软件包在MSSQL数据库上运行它时,以下内容对我有用。
This is a while after the fact but maybe you're still looking or curious for a dplyr
verb based option. I was working on the same issue for my work and came across your (sort of) unanswered question. Below works for me when I run it against an MSSQL database using DBI
and odbc
packages.
在连接数据库之前,我从表中选择了感兴趣的列,因为通常这是查询数据库时的最佳实践。 dplyr
连接函数默认情况下会进行自然连接,因此您可能不必显式提供 by
参数。
I selected the columns of interest from the tables before joining as that is generally best practice when querying databases. dplyr
join functions will do a natural join by default so you might not have to provide the by
argument explicitly.
db_con <- DBI::dbConnect(
drv = odbc::odbc(),
dsn = <data source name>
)
db_con %>%
tbl("table1") %>%
select(col1, col2, col3) %>%
left_join(
db_con %>% tbl("table2") %>% select(col3, col4, col5)
)
这篇关于dplyr sql连接的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!