如何在jdbc数据源中为dbtable选项使用子查询? [英] How to use a subquery for dbtable option in jdbc data source?

查看:346
本文介绍了如何在jdbc数据源中为dbtable选项使用子查询?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想使用Spark处理来自JDBC源的一些数据.但是首先,我不想在JDBC端运行一些查询来过滤列和联接表,并在Spark SQL中将查询结果作为表加载,而不是从JDBC读取原始表.

I want to use Spark to process some data from a JDBC source. But to begin with, instead of reading original tables from JDBC, I want to run some queries on the JDBC side to filter columns and join tables, and load the query result as a table in Spark SQL.

以下加载原始JDBC表的语法对我有用:

The following syntax to load raw JDBC table works for me:

df_table1 = sqlContext.read.format('jdbc').options(
    url="jdbc:mysql://foo.com:3306",
    dbtable="mydb.table1",
    user="me",
    password="******",
    driver="com.mysql.jdbc.Driver" # mysql JDBC driver 5.1.41
).load() 
df_table1.show() # succeeded

根据Spark 文档(我正在使用PySpark 1.6.3):

According to Spark documentation (I'm using PySpark 1.6.3):

dbtable:应该读取的JDBC表.请注意,任何有效的 可以使用SQL查询的FROM子句中.例如,代替 全表中,您也可以在括号中使用子查询.

dbtable: The JDBC table that should be read. Note that anything that is valid in a FROM clause of a SQL query can be used. For example, instead of a full table you could also use a subquery in parentheses.

因此,仅出于实验目的,我尝试了以下简单操作:

So just for experiment, I tried something simple like this:

df_table1 = sqlContext.read.format('jdbc').options(
    url="jdbc:mysql://foo.com:3306",
    dbtable="(SELECT * FROM mydb.table1) AS table1",
    user="me",
    password="******",
    driver="com.mysql.jdbc.Driver"
).load() # failed

它引发了以下异常:

com.mysql.jdbc.exceptions.jdbc4.MySQLSyntaxErrorException: You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near 'table1 WHERE 1=0' at line 1

我还尝试了语法的其他一些变体(添加/删除括号,删除'as'子句,切换大小写等),但没有任何运气.那么正确的语法是什么?在哪里可以找到有关该语法的更详细的文档?此外,错误消息中的这个奇怪的"WHERE 1 = 0"是从哪里来的?谢谢!

I also tried a few other variations of the syntax (add / remove parentheses, remove 'as' clause, switch case, etc) without any luck. So what would be the correct syntax? Where can I find more detailed documentation for the syntax? Besides, where does this weird "WHERE 1=0" in error message come from? Thanks!

推荐答案

要在Spark SQL中使用sql查询从JDBC源读取数据,可以尝试执行以下操作:

For reading data from JDBC source using sql query in Spark SQL, you can try something like this:

val df_table1 = sqlContext.read.format("jdbc").options(Map(
    ("url" -> "jdbc:postgresql://localhost:5432/mydb"),
    ("dbtable" -> "(select * from table1) as table1"),
    ("user" -> "me"),
    ("password" -> "******"),
    ("driver" -> "org.postgresql.Driver"))
).load()

我使用PostgreSQL进行了尝试.您可以根据MySQL对其进行修改.

I tried it using PostgreSQL. You can modify it according to MySQL.

这篇关于如何在jdbc数据源中为dbtable选项使用子查询?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆