dplyr sql连接 [英] dplyr sql joins

查看:100
本文介绍了dplyr sql连接的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

考虑以下SQL:

    SELECT D.product_name
      FROM business.payment P
 LEFT JOIN dim.product_name D 
        ON D.product_id = P.product_id

查询返回

如何在dplyr中复制这样的内容而不将其存入内存?我正在使用数据库连接。

How would something like this be replicated in dplyr without pulling into memory? I'm working with a database connection.

我尝试了以下操作,但无济于事:

I tried the following, but to no avail:

product_name <- 
  business %>% 
  tbl('dim_product') 

business %>% 
  tbl('payment') %>% 
  left_join(product_name, by = 'product_id') %>% 
  select(product_name) %>% 
  collect()

我搜索了很多内容,似乎没有人解决这个问题。

I've searched quite a big and no one seems to have addressed this.

谢谢!

推荐答案

这已经过了一段时间,但也许您仍然在寻找或好奇 dplyr 基于动词的选项。我在处理同一问题时遇到了您(某种)未解决的问题。当我使用 DBI odbc 软件包在MSSQL数据库上运行它时,以下内容对我有用。

This is a while after the fact but maybe you're still looking or curious for a dplyr verb based option. I was working on the same issue for my work and came across your (sort of) unanswered question. Below works for me when I run it against an MSSQL database using DBI and odbc packages.

在连接数据库之前,我从表中选择了感兴趣的列,因为通常这是查询数据库时的最佳实践。 dplyr 连接函数默认情况下会进行自然连接,因此您可能不必显式提供 by 参数。

I selected the columns of interest from the tables before joining as that is generally best practice when querying databases. dplyr join functions will do a natural join by default so you might not have to provide the by argument explicitly.

db_con <- DBI::dbConnect(
  drv = odbc::odbc(),
  dsn = <data source name>
)

db_con %>%
  tbl("table1") %>%
  select(col1, col2, col3) %>%
  left_join(
    db_con %>% tbl("table2") %>% select(col3,  col4, col5)
  )

这篇关于dplyr sql连接的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆