Spark阶段交换的意义 [英] Meaning of Exchange in Spark Stage

查看:24
本文介绍了Spark阶段交换的意义的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

1).WholeStageCodeGen -> 交换2).交换 -> WholeStageCodeGen -> SortAggregate -> 交换

1). WholeStageCodeGen -> Exchange 2). Exchange -> WholeStageCodeGen -> SortAggregate -> Exchange

推荐答案

Whole stage code generation 是一种受现代编译器启发而将整个查询折叠为单个函数的技术在全阶段代码生成之前,每个物理计划都是一个类,其中包含定义执行的代码.通过全阶段代码生成,计划树中的所有物理计划节点协同工作,在单个函数中生成 Java 代码以供执行.然后使用 Janino(一种快速 Java 编译器)将此 Java 代码转换为 JVM 字节码.然后 JVM JIT 开始进一步优化字节码,并最终将它们编译成机器指令.

Whole stage code generation is a technique inspired by modern compilers to collapse the entire query into a single function Prior to whole-stage code generation, each physical plan is a class with the code defining the execution. With whole-stage code generation, all the physical plan nodes in a plan tree work together to generate Java code in a single function for execution. This Java code is then turned into JVM bytecode using Janino, a fast Java compiler. Then JVM JIT kicks in to optimize the bytecode further and eventually compiles them into machine instructions.

例如

== Physical Plan ==
*Project [id#27, token#28, token#6]
+- *SortMergeJoin [id#27], [id#5], Inner
   :- *Sort [id#27 ASC NULLS FIRST], false, 0
   :  +- Exchange hashpartitioning(id#27, 200)

无论您在哪里看到 *,都表示整个stagecodegen 在聚合之前生成了手写代码.Exchange 意味着作业之间的 Shuffle Exchange.Exchange 没有整个阶段的代码生成,因为它是通过网络发送数据.

Where ever you see *, it means that wholestagecodegen has generated hand written code prior to the aggregation. Exchange means the Shuffle Exchange between jobs.Exchange does not have whole-stage code generation because it is sending data across the network.

这篇关于Spark阶段交换的意义的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆