什么是更新星火的RDD内在价值的有效途径？ [英] What is the efficient way to update value inside Spark's RDD?

查看：93 发布时间：2016/5/22 15:49:46 scala apache-spark

本文介绍了什么是更新星火的RDD内在价值的有效途径？的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我写在斯卡拉图形相关的程序与星火。该数据集有4个万个节点400万边缘（你可以把它当作一棵树），但每次（一个迭代），我只能编辑它的一部分，即子树由一个给定的节点为根，并且给定的节点和根之间的路径中的节点。

I'm writing a graph-related program in Scala with Spark. The dataset have 4 million nodes and 4 million edges(you can treat this as a tree), but for each time(an Iteration), I only edit a portion of it, namely a sub-tree rooted by a given node, and the nodes in a path between that given node and root.

的迭代有依赖，这意味着 I + 1 迭代需要从我。所以，我需要存储每个结果迭代的下一个步骤。

The Iteration has dependency, which means i+1 Iteration needs the result coming from i. So I need store the result of each Iteration for next step.

我试图找到一种有效的方式来更新 RDD ，但没有任何线索，以便far.I找到 PairRDD 有一个查找功能，可以减少从 O（N）的计算时间，O（<$ C $ç> M ）， N 表示对象的总数 RDD 和 M 表示在每个分区的元素数。

I'm trying to find an efficient way to update RDD, but have no clue so far.I find that PairRDD have a lookup function which could reduce the computation time from O(N), to O(M), N denote the total number of objects in RDD and M denote the number of elements in each partition.

所以我想反正是有，我可以更新的 RDD 与 0（M）的对象？或者更理想的是，O（1）？（我看到电子邮件Spark中的邮件列表说，查找可以进行修改，以达到O（1））

So I'm thinking is there anyway that I could update an object in the RDD with O(M)? Or more ideally, O(1)?(I see an email in Spark's mail list saying that the lookup can be modified to achieve O(1))

另一件事情是，如果我能做到 O（M）用于更新 RDD ，我能增加分区一定数目比内核我有，实现了更好的性能？数量较大

Another thing is, if I could achieve O(M) for updating the RDD, could I increase the partition to some number larger than the number of cores I have and achieve a better performance?

什么是更新星火的RDD内在价值的有效途径？ [英] What is the efficient way to update value inside Spark's RDD?

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

什么是更新星火的RDD内在价值的有效途径？ [英] What is the efficient way to update value inside Spark&#39;s RDD?

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

什么是更新星火的RDD内在价值的有效途径？ [英] What is the efficient way to update value inside Spark's RDD?

登录关闭