Spark阶段交换的意义 [英] Meaning of Exchange in Spark Stage

查看:108
本文介绍了Spark阶段交换的意义的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

1). WholeStageCodeGen->交换 2).交换-> WholeStageCodeGen-> SortAggregate->交换

1). WholeStageCodeGen -> Exchange 2). Exchange -> WholeStageCodeGen -> SortAggregate -> Exchange

推荐答案

整个阶段的代码生成是一种受现代编译器启发的技术,用于将整个查询折叠为一个函数 在生成整个阶段的代码之前,每个物理计划都是一个类,其中代码定义了执行.通过整个阶段的代码生成,计划树中的所有物理计划节点可以协同工作,从而在单个函数中生成Java代码以供执行.然后,使用快速Java编译器Janino将Java代码转换为JVM字节码.然后,JVM JIT开始进一步优化字节码,并最终将其编译为机器指令.

Whole stage code generation is a technique inspired by modern compilers to collapse the entire query into a single function Prior to whole-stage code generation, each physical plan is a class with the code defining the execution. With whole-stage code generation, all the physical plan nodes in a plan tree work together to generate Java code in a single function for execution. This Java code is then turned into JVM bytecode using Janino, a fast Java compiler. Then JVM JIT kicks in to optimize the bytecode further and eventually compiles them into machine instructions.

例如

== Physical Plan ==
*Project [id#27, token#28, token#6]
+- *SortMergeJoin [id#27], [id#5], Inner
   :- *Sort [id#27 ASC NULLS FIRST], false, 0
   :  +- Exchange hashpartitioning(id#27, 200)

无论您在何处看到*,都表示wholestagecodegen在聚合之前已生成了手写代码. Exchange意味着作业之间的随机交换.Exchange没有整个阶段的代码生成,因为它正在通过网络发送数据.

Where ever you see *, it means that wholestagecodegen has generated hand written code prior to the aggregation. Exchange means the Shuffle Exchange between jobs.Exchange does not have whole-stage code generation because it is sending data across the network.

这篇关于Spark阶段交换的意义的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆