为什么spark在sql查询的末尾附加'WHERE 1 = 0' [英] why does spark appends 'WHERE 1=0' at the end of sql query

查看:72
本文介绍了为什么spark在sql查询的末尾附加'WHERE 1 = 0'的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试使用 Apache Spark 执行一个简单的 mysql 查询并创建一个数据框.但是由于某些原因,spark 在我要执行的查询的末尾附加 'WHERE 1=0' 并抛出一个异常,说明 'You have an error in your SQL syntax'.

I am trying to execute a simple mysql query using Apache Spark and create a data frame. But for some reasons spark appends 'WHERE 1=0' at the end of the query which I want to execute and throws an exception stating 'You have an error in your SQL syntax'.

val spark = SparkSession.builder.master("local[*]").appName("rddjoin"). getOrCreate()
 val mhost = "jdbc:mysql://localhost:3306/registry"
val mprop = new java.util.Properties
mprop.setProperty("driver", "com.mysql.jdbc.Driver")mprop.setProperty("user", "root")
mprop.setProperty("password", "root")
val q= """select id from loaded_item"""
val res=spark.read.jdbc(mhost,q,mprop)
res.show(10)

例外如下:

18/02/16 17:53:49 INFO StateStoreCoordinatorRef: Registered StateStoreCoordinator endpoint
Exception in thread "main" com.mysql.jdbc.exceptions.jdbc4.MySQLSyntaxErrorException: You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near 'select id from loaded_item WHERE 1=0' at line 1
    at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
    at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
    at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
    at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
    at com.mysql.jdbc.Util.handleNewInstance(Util.java:425)
    at com.mysql.jdbc.Util.getInstance(Util.java:408)
    at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:944)
    at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:3973)
    at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:3909)
    at com.mysql.jdbc.MysqlIO.sendCommand(MysqlIO.java:2527)
    at com.mysql.jdbc.MysqlIO.sqlQueryDirect(MysqlIO.java:2680)
    at com.mysql.jdbc.ConnectionImpl.execSQL(ConnectionImpl.java:2484)
    at com.mysql.jdbc.PreparedStatement.executeInternal(PreparedStatement.java:1858)
    at com.mysql.jdbc.PreparedStatement.executeQuery(PreparedStatement.java:1966)
    at org.apache.spark.sql.execution.datasources.jdbc.JDBCRDD$.resolveTable(JDBCRDD.scala:62)
    at org.apache.spark.sql.execution.datasources.jdbc.JDBCRelation.<init>(JDBCRelation.scala:114)
    at org.apache.spark.sql.execution.datasources.jdbc.JdbcRelationProvider.createRelation(JdbcRelationProvider.scala:52)
    at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:307)
    at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:178)
    at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:146)
    at org.apache.spark.sql.DataFrameReader.jdbc(DataFrameReader.scala:193)
    at GenerateReport$.main(GenerateReport.scala:46)
    at GenerateReport.main(GenerateReport.scala)
18/02/16 17:53:50 INFO SparkContext: Invoking stop() from shutdown hook

推荐答案

您调用 spark.read.jdbc 的第二个参数不正确.您应该使用由架构限定的表名或具有别名的有效 SQL 查询,而不是指定 sql 查询.在您的情况下,这将是 val q="registry.loaded_item".如果您想提供附加参数(可能用于 where 语句),另一种选择是使用其他版本的 DataframeReader.jdbc.

The second parameter of your call to spark.read.jdbc is not correct. Instead of specifing a sql query, you should either use a table name qualified with schema or a valid SQL query with an alias. In your case this would be val q="registry.loaded_item". Another option if you want to provide addional parameters (maybe for a where statement) is to use the other versions of DataframeReader.jdbc.

顺便说一句:你看到奇怪的查询 WHERE 1=0 的原因是 Spark 试图推断你的数据框的模式而不加载任何实际数据.此查询保证永远不会传递任何结果,但 Spark 可以使用查询结果的元数据来获取架构信息.

Btw: the reason why you see the strange looking query WHERE 1=0 is that Spark tries to infer the schema of your data frame without loading any actual data. This query is guaranteed never to deliver any results, but the query result's metadata can be used by Spark to get the schema information.

这篇关于为什么spark在sql查询的末尾附加'WHERE 1 = 0'的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆