为什么Spark的重新分区没有将数据平衡到分区中? [英] Why the Spark's repartition didn't balance data into partitions?

查看：121 发布时间：2020/9/4 3:24:06 apache-spark pyspark rdd

本文介绍了为什么Spark的重新分区没有将数据平衡到分区中?的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

>>> rdd = sc.parallelize(range(10), 2)
>>> rdd.glom().collect()
[[0, 1, 2, 3, 4], [5, 6, 7, 8, 9]]
>>> rdd.repartition(3).glom().collect()
[[], [0, 1, 2, 3, 4], [5, 6, 7, 8, 9]]
>>>

第一个分区为空?为什么?非常感谢您告诉我原因.

The first partition is empty? Why? I really appreciate you telling me the reasons.

为什么Spark的重新分区没有将数据平衡到分区中? [英] Why the Spark's repartition didn't balance data into partitions?

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

为什么Spark的重新分区没有将数据平衡到分区中? [英] Why the Spark&#39;s repartition didn&#39;t balance data into partitions?

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

为什么Spark的重新分区没有将数据平衡到分区中? [英] Why the Spark's repartition didn't balance data into partitions?

登录关闭