在Cassandra中计算表的大小 [英] Calculating the size of a table in Cassandra

查看：130 发布时间：2020/9/29 19:30:45 cassandra

本文介绍了在Cassandra中计算表的大小的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

在杰夫·卡彭特（Jeff Carpenter&）的 Cassandra权威指南（第二版）中Eben Hewitt，以下公式用于计算磁盘上表的大小（对模糊部分表示歉意）：

In "Cassandra The Definitive Guide" (2nd edition) by Jeff Carpenter & Eben Hewitt, the following formula is used to calculate the size of a table on disk (apologies for the blurred part):

ck：主键列

cs：静态列

cr：常规列

cc：群集列

Nr：行数

Nv ：它用于计算时间戳的总大小（我没有完全了解这部分，但现在我将忽略它）。

ck: primary key columns
cs: static columns
cr: regular columns
cc: clustering columns
Nr: number of rows
Nv: it's used for counting the total size of the timestamps (I don't get this part completely, but for now I'll ignore it).

在此等式中我不了解两件事。

There are two things I don't understand in this equation.

首先：为什么要对每个列的聚簇大小进行计数常规栏？我们不应该乘以行数吗？在我看来，通过这种方式进行计算，就是说每个聚类列中的数据都会为每个常规列进行复制，我想并非如此。

First: why do clustering columns size gets counted for every regular column? Shouldn't we multiply it by the number of rows? It seems to me that by calculating this way, we're saying that the data in each clustering column, gets replicated for each regular column, which I suppose is not the case.

第二：为什么主键列没有乘以分区数？根据我的理解，如果我们的节点有两个分区，则应将主键列的大小乘以2，因为该节点中将有两个不同的主键。

Second: why do primary key columns don't get multiplied by the number of partitions? From my understanding, if we have a node with two partitions, then we should multiply the size of the primary key columns by two because we'll have two different primary keys in that node.

推荐答案

这是因为Cassandra的版本< 3内部结构。

It's because of Cassandra's version < 3 internal structure.

每个唯一的分区键值只有一个条目。

每个单独的分区键值，静态列只有一个条目

集群键没有一个空条目

一行中的每一列只有一个每个聚类键列的条目

There is only one entry for each distinct partition key value.
For each distinct partition key value there is only one entry for static column
There is an empty entry for the clustering key
For each column in a row there is a single entry for each clustering key column

让我们举个例子：

CREATE TABLE my_table (
    pk1 int,
    pk2 int,
    ck1 int,
    ck2 int,
    d1 int,
    d2 int,
    s int static,
    PRIMARY KEY ((pk1, pk2), ck1, ck2)
);

插入一些虚拟数据：

 pk1 | pk2 | ck1 | ck2  | s     | d1     | d2
-----+-----+-----+------+-------+--------+---------
   1 |  10 | 100 | 1000 | 10000 | 100000 | 1000000
   1 |  10 | 100 | 1001 | 10000 | 100001 | 1000001
   2 |  20 | 200 | 2000 | 20000 | 200000 | 2000001

内部结构将是：

             |100:1000:  |100:1000:d1|100:1000:d2|100:1001:  |100:1001:d1|100:1001:d2|  
-----+-------+-----------+-----------+-----------+-----------+-----------+-----------+
1:10 | 10000 |           |  100000   |  1000000  |           |  100001   |  1000001  |


             |200:2000:  |200:2000:d1|200:2000:d2|
-----+-------+-----------+-----------+-----------+ 
2:20 | 20000 |           |  200000   |  2000000  |

因此表的大小为：

Single Partition Size = (4 + 4 + 4 + 4) + 4 + 2 * ((4 + (4 + 4)) + (4 + (4 + 4))) byte = 68 byte

Estimated Table Size = Single Partition Size * Number Of Partition 
                     = 68 * 2 byte
                     = 136 byte

此处所有字段类型均为int（4字节）

有4个主键列，1个静态列，2个集群键列和2个常规列

这篇关于在Cassandra中计算表的大小的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

在Cassandra中计算表的大小 [英] Calculating the size of a table in Cassandra

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

在Cassandra中计算表的大小 [英] Calculating the size of a table in Cassandra

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭