spark-cassandra-connector 3.0.0-如何计算directJoinSizeRatio [英] spark-cassandra-connector 3.0.0 - How to calculate directJoinSizeRatio

查看:119
本文介绍了spark-cassandra-connector 3.0.0-如何计算directJoinSizeRatio的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个16节点的cassandra集群和一个表,根据cfstats的数据,总数约为(8.9 x 16 =)143Gb.我有一个复制因子3(我不确定是否相关),唯一分区键的数量为4.827.我正在尝试计算比率,以便每次我想加入3.170个以上的分区键时都将其关闭.

I have a 16 node cassandra cluster and a table which according to cfstats, in total is around (8.9 x 16 = )143Gb. I have a replication factor 3(I am not sure if it is relevant) and the number of the unique partition keys is 4.827. I am trying to calculate the ratio so to turn it off every time I want to join on more than 3.170 partition keys.

directJoinSizeRatio参数的公式为:

The formula of the directJoinSizeRatio parameter is:

(table size * directJoinSizeRatio) > size of keys

但是,表大小指的是什么.是cfstats在每个节点中表示的表的压缩大小的总和吗?仅仅是一个节点中表的大小吗?

But, to what exactly the table size refers to. Is it the sum of the compacted sizes of the tables that cfstats says in every node? Is it just the size of the table in one node?

因此,在我的情况下,我有4827个uuid作为字符串.会是:

So, in my case I have 4827 uuids as strings. Would it be:

143.000.000.000 bytes * parameter > 3170 *36 bytes?

8.900.000.000 bytes * parameter > 3170 *36 bytes?

这是否意味着我真的必须将参数分别降低为0,000000798或0,000012822?

Does this mean that I really have to lower the parameter to i.e. 0,000000798 or 0,000012822 respectively?

推荐答案

由于要对分区键进行操作,因此可能需要使用它们来计算所需的比率-在这种情况下,应为:3170/4827〜 = 0.657,但这并不精确,因为您的分区可能更大或更小,因此大小估算将不准确.另外,大小估算值只是估算值,因此不够精确.我会尝试在0.6-0.66的范围内使用该比例...

Because you're operating on partition keys, then you may need to use them to calculate the desired ratio - in this case it should be: 3170/4827 ~= 0.657, but it won't be precise because your partitions could be smaller or bigger, so size estimate won't be exact. Also, the size estimate is just estimate, so it won't be precise. I would try to play with that ratio in the range of 0.6-0.66...

这篇关于spark-cassandra-connector 3.0.0-如何计算directJoinSizeRatio的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆