在 Cassandra 中计算表的大小 [英] Calculating the size of a table in Cassandra

查看:50
本文介绍了在 Cassandra 中计算表的大小的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在 Jeff Carpenter 的Cassandra The Definitive Guide"(第 2 版)中Eben Hewitt,以下公式用于计算磁盘上表的大小(对模糊部分表示歉意):

In "Cassandra The Definitive Guide" (2nd edition) by Jeff Carpenter & Eben Hewitt, the following formula is used to calculate the size of a table on disk (apologies for the blurred part):

  • ck:主键列
  • cs:静态列
  • cr:常规列
  • cc:聚类列
  • Nr:行数
  • Nv:用于计算时间戳的总大小(我没有完全理解这部分,但现在我将忽略它).

在这个等式中有两件事我不明白.

There are two things I don't understand in this equation.

第一:为什么要为每个常规列计算聚集列的大小?我们不应该乘以行数吗?在我看来,通过这种计算方式,我们是说每个聚类列中的数据都会为每个常规列复制,我认为情况并非如此.

First: why do clustering columns size gets counted for every regular column? Shouldn't we multiply it by the number of rows? It seems to me that by calculating this way, we're saying that the data in each clustering column, gets replicated for each regular column, which I suppose is not the case.

第二:为什么主键列没有乘以分区数?根据我的理解,如果我们有一个有两个分区的节点,那么我们应该将主键列的大小乘以 2,因为我们将在该节点中有两个不同的主键.

Second: why do primary key columns don't get multiplied by the number of partitions? From my understanding, if we have a node with two partitions, then we should multiply the size of the primary key columns by two because we'll have two different primary keys in that node.

推荐答案

是因为 Cassandra 的版本3 内部结构.

It's because of Cassandra's version < 3 internal structure.

  • 每个不同的分区键值只有一个条目.
  • 对于每个不同的分区键值,静态列只有一个条目
  • 集群键有一个空条目
  • 对于一行中的每一列,每个聚类键列都有一个条目

举个例子:

CREATE TABLE my_table (
    pk1 int,
    pk2 int,
    ck1 int,
    ck2 int,
    d1 int,
    d2 int,
    s int static,
    PRIMARY KEY ((pk1, pk2), ck1, ck2)
); 

插入一些虚拟数据:

 pk1 | pk2 | ck1 | ck2  | s     | d1     | d2
-----+-----+-----+------+-------+--------+---------
   1 |  10 | 100 | 1000 | 10000 | 100000 | 1000000
   1 |  10 | 100 | 1001 | 10000 | 100001 | 1000001
   2 |  20 | 200 | 2000 | 20000 | 200000 | 2000001

内部结构将是:

             |100:1000:  |100:1000:d1|100:1000:d2|100:1001:  |100:1001:d1|100:1001:d2|  
-----+-------+-----------+-----------+-----------+-----------+-----------+-----------+
1:10 | 10000 |           |  100000   |  1000000  |           |  100001   |  1000001  |


             |200:2000:  |200:2000:d1|200:2000:d2|
-----+-------+-----------+-----------+-----------+ 
2:20 | 20000 |           |  200000   |  2000000  |

所以表的大小将是:

Single Partition Size = (4 + 4 + 4 + 4) + 4 + 2 * ((4 + (4 + 4)) + (4 + (4 + 4))) byte = 68 byte

Estimated Table Size = Single Partition Size * Number Of Partition 
                     = 68 * 2 byte
                     = 136 byte

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆