如何火花RDD.randomSplit实际上分裂RDD [英] How does Sparks RDD.randomSplit actually split the RDD

查看：1049 发布时间：2016/5/22 15:44:53 apache-spark rdd

本文介绍了如何火花RDD.randomSplit实际上分裂RDD的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

所以，假设香港专业教育学院有一个RDD 3000行。 2000年的第一行是1级和1000最后行是类class2。
该RDD横跨100个分区分区。

So assume ive got an rdd with 3000 rows. The 2000 first rows are of class 1 and the 1000 last rows are of class2. The RDD is partitioned across 100 partitions.

当调用RDD.randomSplit（0.8,0.2）

When calling RDD.randomSplit(0.8,0.2)

请问函数还洗牌RDD？我们不会分裂只是不断的RDD抽样20％？抑或是选择分区的20％随机？

Does the function also shuffle the rdd? Our does the splitting simply sample 20% continuously of the rdd? Or does it select 20% of the partitions randomly?

理想情况下一样所得分裂id来具有相同的类分配作为原始RDD。即2：1

Ideally id like the resulting splits to have the same class distribution as the original RDD. ie 2:1

感谢

如何火花RDD.randomSplit实际上分裂RDD [英] How does Sparks RDD.randomSplit actually split the RDD

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

如何火花RDD.randomSplit实际上分裂RDD [英] How does Sparks RDD.randomSplit actually split the RDD

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭