如何在不执行的情况下验证Spark SQL表达式? [英] How to validate Spark SQL expression without executing it?

查看:107
本文介绍了如何在不执行的情况下验证Spark SQL表达式?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想在没有在集群上实际运行查询的情况下验证spark-sql查询在语法上是否正确.

实际用例是我正在尝试开发一个用户界面,该界面接受用户输入spark-sql查询,并且我应该能够验证所提供的查询在语法上是否正确.另外,如果在解析查询后,我可以针对最好的Spark最佳做法提供有关该查询的任何建议.

解决方案

SparkSqlParser

Spark SQL使用


提示:为 org.apache.spark.sql.execution.SparkSqlParser 记录器启用 INFO 日志记录级别,以查看内部发生了什么./p>

SparkSession.sql方法

仅此一项并不能为您提供最正确的防弹盾,以防止错误的SQL表达式,并认为

请同时参阅以下内容.

  scala>parser.parseExpression("hello world")res5:org.apache.spark.sql.catalyst.expressions.Expression ='hello AS world#2斯卡拉>spark.sql("hello world")org.apache.spark.sql.catalyst.parser.ParseException:输入'hello'不匹配,期望{'(','SELECT','FROM','ADD','DESC','WITH','VALUES','CREATE','TABLE','INSERT','DELETE',"DESCRIBE","EXPLAIN","SHOW","USE","DROP","ALTER","MAP","SET","RESET","START","COMMIT","ROLLBACK","REDUCE","REFRESH","CLEAR","CACHE","UNCACHE","DFS","TRUNCATE","ANALYZE","LIST","REVOKE","GRANT","LOCK","UNLOCK","MSCK","EXPORT","IMPORT","LOAD"}(第1行,位置0)== SQL ==你好世界^^^在org.apache.spark.sql.catalyst.parser.ParseException.withCommand(ParseDriver.scala:217)在org.apache.spark.sql.catalyst.parser.AbstractSqlParser.parse(ParseDriver.scala:114)在org.apache.spark.sql.execution.SparkSqlParser.parse(SparkSqlParser.scala:48)在org.apache.spark.sql.catalyst.parser.AbstractSqlParser.parsePlan(ParseDriver.scala:68)在org.apache.spark.sql.SparkSession.sql(SparkSession.scala:638)... 49消失 

I want to validate if spark-sql query is syntactically correct or not without actually running the query on the cluster.

Actual use case is that I am trying to develop a user interface, which accepts user to enter a spark-sql query and I should be able to verify if the query provided is syntactically correct or not. Also if after parsing the query, I can give any recommendation about the query with respect to spark best practices that would be best.

解决方案

SparkSqlParser

Spark SQL uses SparkSqlParser as the parser for Spark SQL expressions.

You can access SparkSqlParser using SparkSession (and SessionState) as follows:

val spark: SparkSession = ...
val parser = spark.sessionState.sqlParser

scala> parser.parseExpression("select * from table")
res1: org.apache.spark.sql.catalyst.expressions.Expression = ('select * 'from) AS table#0


TIP: Enable INFO logging level for org.apache.spark.sql.execution.SparkSqlParser logger to see what happens inside.

SparkSession.sql Method

That alone won't give you the most bullet-proof shield against incorrect SQL expressions and think sql method is a better fit.

sql(sqlText: String): DataFrame Executes a SQL query using Spark, returning the result as a DataFrame. The dialect that is used for SQL parsing can be configured with 'spark.sql.dialect'.

See both in action below.

scala> parser.parseExpression("hello world")
res5: org.apache.spark.sql.catalyst.expressions.Expression = 'hello AS world#2

scala> spark.sql("hello world")
org.apache.spark.sql.catalyst.parser.ParseException:
mismatched input 'hello' expecting {'(', 'SELECT', 'FROM', 'ADD', 'DESC', 'WITH', 'VALUES', 'CREATE', 'TABLE', 'INSERT', 'DELETE', 'DESCRIBE', 'EXPLAIN', 'SHOW', 'USE', 'DROP', 'ALTER', 'MAP', 'SET', 'RESET', 'START', 'COMMIT', 'ROLLBACK', 'REDUCE', 'REFRESH', 'CLEAR', 'CACHE', 'UNCACHE', 'DFS', 'TRUNCATE', 'ANALYZE', 'LIST', 'REVOKE', 'GRANT', 'LOCK', 'UNLOCK', 'MSCK', 'EXPORT', 'IMPORT', 'LOAD'}(line 1, pos 0)

== SQL ==
hello world
^^^

  at org.apache.spark.sql.catalyst.parser.ParseException.withCommand(ParseDriver.scala:217)
  at org.apache.spark.sql.catalyst.parser.AbstractSqlParser.parse(ParseDriver.scala:114)
  at org.apache.spark.sql.execution.SparkSqlParser.parse(SparkSqlParser.scala:48)
  at org.apache.spark.sql.catalyst.parser.AbstractSqlParser.parsePlan(ParseDriver.scala:68)
  at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:638)
  ... 49 elided

这篇关于如何在不执行的情况下验证Spark SQL表达式?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆