在 Cassandra 中计算表的大小 [英] Calculating the size of a table in Cassandra
问题描述
在 Jeff Carpenter 的Cassandra The Definitive Guide"(第 2 版)中Eben Hewitt,以下公式用于计算磁盘上表的大小(对模糊部分表示歉意):
In "Cassandra The Definitive Guide" (2nd edition) by Jeff Carpenter & Eben Hewitt, the following formula is used to calculate the size of a table on disk (apologies for the blurred part):
- ck:主键列
- cs:静态列
- cr:常规列
- cc:聚类列
- Nr:行数
- Nv:用于计算时间戳的总大小(我没有完全理解这部分,但现在我将忽略它).
在这个等式中有两件事我不明白.
There are two things I don't understand in this equation.
第一:为什么要为每个常规列计算聚集列的大小?我们不应该乘以行数吗?在我看来,通过这种计算方式,我们是说每个聚类列中的数据都会为每个常规列复制,我认为情况并非如此.
First: why do clustering columns size gets counted for every regular column? Shouldn't we multiply it by the number of rows? It seems to me that by calculating this way, we're saying that the data in each clustering column, gets replicated for each regular column, which I suppose is not the case.
第二:为什么主键列没有乘以分区数?根据我的理解,如果我们有一个有两个分区的节点,那么我们应该将主键列的大小乘以 2,因为我们将在该节点中有两个不同的主键.
Second: why do primary key columns don't get multiplied by the number of partitions? From my understanding, if we have a node with two partitions, then we should multiply the size of the primary key columns by two because we'll have two different primary keys in that node.
推荐答案
是因为 Cassandra 的版本3 内部结构.
It's because of Cassandra's version < 3 internal structure.
- 每个不同的分区键值只有一个条目.
- 对于每个不同的分区键值,静态列只有一个条目
- 集群键有一个空条目
- 对于一行中的每一列,每个聚类键列都有一个条目
举个例子:
CREATE TABLE my_table (
pk1 int,
pk2 int,
ck1 int,
ck2 int,
d1 int,
d2 int,
s int static,
PRIMARY KEY ((pk1, pk2), ck1, ck2)
);
插入一些虚拟数据:
pk1 | pk2 | ck1 | ck2 | s | d1 | d2
-----+-----+-----+------+-------+--------+---------
1 | 10 | 100 | 1000 | 10000 | 100000 | 1000000
1 | 10 | 100 | 1001 | 10000 | 100001 | 1000001
2 | 20 | 200 | 2000 | 20000 | 200000 | 2000001
内部结构将是:
|100:1000: |100:1000:d1|100:1000:d2|100:1001: |100:1001:d1|100:1001:d2|
-----+-------+-----------+-----------+-----------+-----------+-----------+-----------+
1:10 | 10000 | | 100000 | 1000000 | | 100001 | 1000001 |
|200:2000: |200:2000:d1|200:2000:d2|
-----+-------+-----------+-----------+-----------+
2:20 | 20000 | | 200000 | 2000000 |
所以表的大小将是:
Single Partition Size = (4 + 4 + 4 + 4) + 4 + 2 * ((4 + (4 + 4)) + (4 + (4 + 4))) byte = 68 byte
Estimated Table Size = Single Partition Size * Number Of Partition
= 68 * 2 byte
= 136 byte
- 这里所有的字段类型都是int(4字节)
- 有 4 个主键列、1 个静态列、2 个聚簇键列和 2 个常规列
这篇关于在 Cassandra 中计算表的大小的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!