Spark 2.0中的全阶段代码生成 [英] Whole-Stage Code Generation in Spark 2.0
问题描述
我听说过Whole-Stage Code Generation
用于优化查询的sql.
通过 p539-neumann.pdf & sparksql-sql-codegen-is-not-giving-any-improvemnt
I heard about Whole-Stage Code Generation
for sql to optimize queries.
through p539-neumann.pdf & sparksql-sql-codegen-is-not-giving-any-improvemnt
但不幸的是,没有人回答上述问题.
But unfortunately no one gave answer to above question.
想知道使用Spark 2.0的此功能有哪些方案.但是在谷歌搜索后没有得到正确的用例.
Curious to know about what are the scenarios to use this feature of Spark 2.0. But didn't get proper use-case after googling.
无论何时我们使用sql,都可以使用此功能吗?如果是这样,是否有适当的用例才能看到此功能?
Whenever we are using sql, can we use this feature? if so, any proper use case to see this working?
推荐答案
在使用Spark 2.0时,默认情况下启用代码生成.这使您能够利用大多数DataFrame查询来提高性能.有一些潜在的例外情况,例如使用Python UDF可能会使速度变慢.
When you are using Spark 2.0, code generation is enabled by default. This allows for most DataFrame queries you are able to take advantage of the performance improvements. There are some potential exceptions such as using Python UDFs that may slow things down.
代码生成是Spark SQL引擎的Catalyst Optimizer的主要组件之一.简而言之,Catalyst Optimizer引擎执行以下操作: (1)分析逻辑计划以解决引用, (2)逻辑计划优化 (3)物理规划,以及 (4)代码生成
Code generation is one of the primary components of the Spark SQL engine's Catalyst Optimizer. In brief, the Catalyst Optimizer engine does the following: (1) analyzing a logical plan to resolve references, (2) logical plan optimization (3) physical planning, and (4) code generation
所有这些都很好地引用了博客文章
A great reference to all of this are the blog posts
Apache Spark as a Compiler: Joining a Billion Rows per Second on a Laptop
HTH!
这篇关于Spark 2.0中的全阶段代码生成的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!