如何在不执行的情况下验证 Spark SQL 表达式? [英] How to validate Spark SQL expression without executing it?
问题描述
我想在不实际在集群上运行查询的情况下验证 spark-sql 查询在语法上是否正确.
I want to validate if spark-sql query is syntactically correct or not without actually running the query on the cluster.
实际用例是我正在尝试开发一个用户界面,它接受用户输入 spark-sql 查询,我应该能够验证所提供的查询在语法上是否正确.此外,如果在解析查询后,我可以就最好的 Spark 最佳实践提供有关查询的任何建议.
Actual use case is that I am trying to develop a user interface, which accepts user to enter a spark-sql query and I should be able to verify if the query provided is syntactically correct or not. Also if after parsing the query, I can give any recommendation about the query with respect to spark best practices that would be best.
推荐答案
SparkSqlParser
Spark SQL 使用 SparkSqlParser 作为 Spark SQL 表达式的解析器.
SparkSqlParser
Spark SQL uses SparkSqlParser as the parser for Spark SQL expressions.
您可以使用 SparkSession
(和 SessionState
)访问 SparkSqlParser
,如下所示:
You can access SparkSqlParser
using SparkSession
(and SessionState
) as follows:
val spark: SparkSession = ...
val parser = spark.sessionState.sqlParser
scala> parser.parseExpression("select * from table")
res1: org.apache.spark.sql.catalyst.expressions.Expression = ('select * 'from) AS table#0
<小时>
提示:为 org.apache.spark.sql.execution.SparkSqlParser
记录器启用 INFO
日志记录级别以查看内部发生的情况.
TIP: Enable INFO
logging level for org.apache.spark.sql.execution.SparkSqlParser
logger to see what happens inside.
仅凭这一点并不能为您提供最有效的防弹盾牌来抵御不正确的 SQL 表达式并思考 sql 方法更适合.
That alone won't give you the most bullet-proof shield against incorrect SQL expressions and think sql method is a better fit.
sql(sqlText: String): DataFrame 使用 Spark 执行 SQL 查询,将结果作为 DataFrame 返回.用于 SQL 解析的方言可以通过 'spark.sql.dialect' 进行配置.
sql(sqlText: String): DataFrame Executes a SQL query using Spark, returning the result as a DataFrame. The dialect that is used for SQL parsing can be configured with 'spark.sql.dialect'.
请参阅下面的操作.
scala> parser.parseExpression("hello world")
res5: org.apache.spark.sql.catalyst.expressions.Expression = 'hello AS world#2
scala> spark.sql("hello world")
org.apache.spark.sql.catalyst.parser.ParseException:
mismatched input 'hello' expecting {'(', 'SELECT', 'FROM', 'ADD', 'DESC', 'WITH', 'VALUES', 'CREATE', 'TABLE', 'INSERT', 'DELETE', 'DESCRIBE', 'EXPLAIN', 'SHOW', 'USE', 'DROP', 'ALTER', 'MAP', 'SET', 'RESET', 'START', 'COMMIT', 'ROLLBACK', 'REDUCE', 'REFRESH', 'CLEAR', 'CACHE', 'UNCACHE', 'DFS', 'TRUNCATE', 'ANALYZE', 'LIST', 'REVOKE', 'GRANT', 'LOCK', 'UNLOCK', 'MSCK', 'EXPORT', 'IMPORT', 'LOAD'}(line 1, pos 0)
== SQL ==
hello world
^^^
at org.apache.spark.sql.catalyst.parser.ParseException.withCommand(ParseDriver.scala:217)
at org.apache.spark.sql.catalyst.parser.AbstractSqlParser.parse(ParseDriver.scala:114)
at org.apache.spark.sql.execution.SparkSqlParser.parse(SparkSqlParser.scala:48)
at org.apache.spark.sql.catalyst.parser.AbstractSqlParser.parsePlan(ParseDriver.scala:68)
at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:638)
... 49 elided
这篇关于如何在不执行的情况下验证 Spark SQL 表达式?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!