是否有一个“说明RDD"?火花中 [英] Is there an "Explain RDD" in spark

查看：70 发布时间：2020/9/4 6:44:07 apache-spark rdd

本文介绍了是否有一个“说明RDD"?火花中的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

尤其是如果我说

rdd3 = rdd1.join(rdd2)

然后，当我调用rdd3.collect时，取决于所使用的Partitioner，要么在节点分区之间移动数据，要么在每个分区上本地完成连接(或者，据我所知，完全是其他操作). 这取决于RDD论文所说的窄"和宽"依赖关系，但是谁知道优化程序在实践中有多出色.

then when I call rdd3.collect, depending on the Partitioner used, either data is moved between nodes partitions, or the join is done locally on each partition (or, for all I know, something else entirely). This depends on what the RDD paper calls "narrow" and "wide" dependencies, but who knows how good the optimizer is in practice.

无论如何，我可以从跟踪输出中收集实际发生了什么事情，但是最好调用rdd3.explain.

Anyways, I can kind of glean from the trace output which thing actually happened, but it would be nice to call rdd3.explain.

这样的东西存在吗?

推荐答案

我认为toDebugString会安抚您的好奇心.

I think toDebugString will appease your curiosity.

scala> val data = sc.parallelize(List((1,2)))
data: org.apache.spark.rdd.RDD[(Int, Int)] = ParallelCollectionRDD[8] at parallelize at <console>:21

scala> val joinedData = data join data
joinedData: org.apache.spark.rdd.RDD[(Int, (Int, Int))] = MapPartitionsRDD[11] at join at <console>:23

scala> joinedData.toDebugString
res4: String =
(8) MapPartitionsRDD[11] at join at <console>:23 []
 |  MapPartitionsRDD[10] at join at <console>:23 []
 |  CoGroupedRDD[9] at join at <console>:23 []
 +-(8) ParallelCollectionRDD[8] at parallelize at <console>:21 []
 +-(8) ParallelCollectionRDD[8] at parallelize at <console>:21 []

每个缩进都是一个阶段，因此应该分为两个阶段.

Each indentation is a stage, so this should run as two stages.

此外，优化程序相当不错，但是我建议使用DataFrames，如果您使用1.3+作为优化程序，则在许多情况下甚至会更好:)

Also, the optimizer is fairly decent, however I would suggest using DataFrames if you are using 1.3+ as the optimizer there is EVEN better in many cases:)

这篇关于是否有一个“说明RDD"?火花中的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

是否有一个“说明RDD"?火花中 [英] Is there an "Explain RDD" in spark

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

是否有一个“说明RDD"?火花中 [英] Is there an &quot;Explain RDD&quot; in spark

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

是否有一个“说明RDD"?火花中 [英] Is there an "Explain RDD" in spark

登录关闭