为什么阿帕奇星火TAKE功能不是水货? [英] Why Apache Spark take function not parallel?

查看:124
本文介绍了为什么阿帕奇星火TAKE功能不是水货?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

阅读阿帕奇星火指南在 http://spark.apache.org/docs /latest/programming-guide.html 该规定:

Reading Apache Spark guide at http://spark.apache.org/docs/latest/programming-guide.html it states :

为什么需要功能并行不运行?什么是在实施并联这种类型的函数的困难?难道是与事实,以便采取RDD的前n个元素则需要遍历整个RDD?

Why is take function not run in parallel? What are the difficulties in implementing this type of function in parallel ? Is it something to do with fact that in order to take first n elements of RDD it is required to traverse entire RDD ?

推荐答案

其实,而不完全平行的,它不是完全连续的两种。

Actually, while take is not entirely parallel, it's not entirely sequential either.

例如假设您取(200),并且每个分区有10个元素。 将先取0分区,看到它有10个元素。它假定这将需要20个这样的分区拿到200元。但最好在一个平行的请求,要求多一点。所以它要30个分区,并且已经有1所以取分区1到29下,的并行的。这将有可能是最后的步骤。如果这是非常不幸的,并没有发现一共有200元素,它会再次做出估计,并要求另一批平行。

For example let's say you take(200), and each partition has 10 elements. take will first fetch partition 0 and see that it has 10 elements. It assumes that it would need 20 such partitions to get 200 elements. But it's better to ask for a bit more in a parallel request. So it wants 30 partitions, and it already has 1. So it fetches partitions 1 to 29 next, in parallel. This will likely be the last step. If it's very unlucky, and does not find a total of 200 elements, it will again make an estimate and request another batch in parallel.

检查出code,它是有据可查的:
<一href=\"https://github.com/apache/spark/blob/v1.2.0/core/src/main/scala/org/apache/spark/rdd/RDD.scala#L1049\" rel=\"nofollow\">https://github.com/apache/spark/blob/v1.2.0/core/src/main/scala/org/apache/spark/rdd/RDD.scala#L1049

Check out the code, it's well documented: https://github.com/apache/spark/blob/v1.2.0/core/src/main/scala/org/apache/spark/rdd/RDD.scala#L1049

我觉得文件是错误的。本地计算只需要一个分区时发生。这是在第一次通过(获取分区0)的情况下,但通常不在更高通行证的情况

I think the documentation is wrong. Local calculation only happens when a single partition is required. This is the case in the first pass (fetching partition 0), but typically not the case in later passes.

这篇关于为什么阿帕奇星火TAKE功能不是水货?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆