火花产生太多的分区 [英] spark creating too many partitions

查看：151 发布时间：2016/5/22 16:33:53 cassandra apache-spark

本文介绍了火花产生太多的分区的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我有3个与8 GB RAM和2个内核1种子节点和1个火花主3从节点卡桑德拉节点群集。这里是输入到我的工作火花

I have 3 Cassandra node cluster with 1 seed node and 1 spark master and 3 slave nodes with 8 GB ram and 2 cores. Here is the input to my spark jobs

spark.cassandra.input.split.size_in_mb 67108864

在我这个配置集我看到有周围89.1 MB的数据大致1706765记录的创建围绕768的分区上运行。我无法理解为什么创造那么多的分区。我使用卡桑德拉火花连接器版本1.4，以便对输入分割大小的bug也是固定的。

When I run with this configuration set I see that there are around 768 partitions created with around 89.1 MB of data roughly 1706765 records. I am not able to understand why so many partitions are created. I am using Cassandra spark connector version 1.4 so the bug is also fixed regarding input split size.

有只有11个独特的分区键。我的分区键具有应用程序的名字始终是检验和随机数始终是0-10所以只有11个不同的独特的分区。

There are only 11 unique partition key. My partition key has appname which is always test and random number which is always from 0-10 so only 11 different unique partition.

为什么有这么多的分区和怎么来的火花决定多少分区创建

Why so many partitions and how come spark decide how much partitions to create

火花产生太多的分区 [英] spark creating too many partitions

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

火花产生太多的分区 [英] spark creating too many partitions

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭