使用extraOptimizations转换Spark SQL AST [英] Transforming Spark SQL AST with extraOptimizations

查看：80 发布时间：2020/9/4 19:57:16 apache-spark apache-spark-sql apache-spark-2.0

本文介绍了使用extraOptimizations转换Spark SQL AST的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我想将一个SQL字符串作为用户输入，然后在执行之前对其进行转换.特别是，我想修改顶层投影(选择子句)，注入要由查询检索的其他列.

I'm wanting to take a SQL string as a user input, then transform it before execution. In particular, I want to modify the top-level projection (select clause), injecting additional columns to be retrieved by the query.

我希望通过使用sparkSession.experimental.extraOptimizations连接到Catalyst来实现这一目标.我知道我要尝试的并不是严格意义上的优化(转换会更改SQL语句的语义)，但是API似乎仍然合适.但是，查询执行程序似乎忽略了我的转换.

I was hoping to achieve this by hooking into Catalyst using sparkSession.experimental.extraOptimizations. I know that what I'm attempting isn't strictly speaking an optimisation (the transformation changes the semantics of the SQL statement), but the API still seems suitable. However, my transformation seems to be ignored by the query executor.

这里是一个最小的例子来说明我遇到的问题.首先定义一个行案例类:

Here is a minimal example to illustrate the issue I'm having. First define a row case class:

case class TestRow(a: Int, b: Int, c: Int)

然后定义一个优化规则，该规则将简单地丢弃任何投影:

Then define an optimisation rule which simply discards any projection:

object RemoveProjectOptimisationRule extends Rule[LogicalPlan] {
    def apply(plan: LogicalPlan): LogicalPlan = plan transformDown {
        case x: Project => x.child
    }
}

现在创建一个数据集，注册优化并运行SQL查询:

Now create a dataset, register the optimisation, and run a SQL query:

// Create a dataset and register table.
val dataset = List(TestRow(1, 2, 3)).toDS()
val tableName: String = "testtable"
dataset.createOrReplaceTempView(tableName)

// Register "optimisation".
sparkSession.experimental.extraOptimizations =  
    Seq(RemoveProjectOptimisationRule)

// Run query.
val projected = sqlContext.sql("SELECT a FROM " + tableName + " WHERE a = 1")

// Print query result and the queryExecution object.
println("Query result:")
projected.collect.foreach(println)
println(projected.queryExecution)

以下是输出:

Query result: 
[1]

== Parsed Logical Plan ==
'Project ['a]
+- 'Filter ('a = 1)
   +- 'UnresolvedRelation `testtable`

== Analyzed Logical Plan ==
a: int
Project [a#3]
+- Filter (a#3 = 1)
   +- SubqueryAlias testtable
      +- LocalRelation [a#3, b#4, c#5]

== Optimized Logical Plan ==
Filter (a#3 = 1)
+- LocalRelation [a#3, b#4, c#5]

== Physical Plan ==
*Filter (a#3 = 1)
+- LocalTableScan [a#3, b#4, c#5]

我们看到结果与原始SQL语句的结果相同，而未应用转换.但是，在打印逻辑计划和物理计划时，确实已删除了投影.我还(通过调试日志输出)确认该转换确实已被调用.

We see that the result is identical to that of the original SQL statement, without the transformation applied. Yet, when printing the logical and physical plans, the projection has indeed been removed. I've also confirmed (through debug log output) that the transformation is indeed being invoked.

关于这里发生的事情有什么建议吗?也许优化器只是忽略了改变语义的优化"?

Any suggestions as to what's going on here? Maybe the optimiser simply ignores "optimisations" that change semantics?

如果不是使用优化的方法，那么有人可以建议替代方法吗?我真正想做的就是解析输入的SQL语句，对其进行转换，然后将转换后的AST传递给Spark以执行.但据我所知，执行此操作的API是Spark sql包专用的.可能可以使用反射，但是我想避免这种情况.

If using the optimisations isn't the way to go, can anybody suggest an alternative? All I really want to do is parse the input SQL statement, transform it, and pass the transformed AST to Spark for execution. But as far as I can see, the APIs for doing this are private to the Spark sql package. It may be possible to use reflection, but I'd like to avoid that.

任何指针将不胜感激.

使用extraOptimizations转换Spark SQL AST [英] Transforming Spark SQL AST with extraOptimizations

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

使用extraOptimizations转换Spark SQL AST [英] Transforming Spark SQL AST with extraOptimizations

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭