在Cassandra中计算表的大小 [英] Calculating the size of a table in Cassandra

查看:130
本文介绍了在Cassandra中计算表的大小的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在杰夫·卡彭特(Jeff Carpenter&)的 Cassandra权威指南(第二版)中Eben Hewitt,以下公式用于计算磁盘上表的大小(对模糊部分表示歉意):

In "Cassandra The Definitive Guide" (2nd edition) by Jeff Carpenter & Eben Hewitt, the following formula is used to calculate the size of a table on disk (apologies for the blurred part):


  • ck:主键列

  • cs:静态列

  • cr:常规列

  • cc:群集列

  • Nr:行数

  • Nv :它用于计算时间戳的总大小(我没有完全了解这部分,但现在我将忽略它)。

  • ck: primary key columns
  • cs: static columns
  • cr: regular columns
  • cc: clustering columns
  • Nr: number of rows
  • Nv: it's used for counting the total size of the timestamps (I don't get this part completely, but for now I'll ignore it).

在此等式中我不了解两件事。

There are two things I don't understand in this equation.

首先:为什么要对每个列的聚簇大小进行计数常规栏?我们不应该乘以行数吗?在我看来,通过这种方式进行计算,就是说每个聚类列中的数据都会为每个常规列进行复制,我想并非如此。

First: why do clustering columns size gets counted for every regular column? Shouldn't we multiply it by the number of rows? It seems to me that by calculating this way, we're saying that the data in each clustering column, gets replicated for each regular column, which I suppose is not the case.

第二:为什么主键列没有乘以分区数?根据我的理解,如果我们的节点有两个分区,则应将主键列的大小乘以2,因为该节点中将有两个不同的主键。

Second: why do primary key columns don't get multiplied by the number of partitions? From my understanding, if we have a node with two partitions, then we should multiply the size of the primary key columns by two because we'll have two different primary keys in that node.

推荐答案

这是因为Cassandra的版本< 3内部结构。

It's because of Cassandra's version < 3 internal structure.


  • 每个唯一的分区键值只有一个条目。

  • 每个单独的分区键值,静态列只有一个条目

  • 集群键没有一个空条目

  • 一行中的每一列只有一个每个聚类键列的条目

  • There is only one entry for each distinct partition key value.
  • For each distinct partition key value there is only one entry for static column
  • There is an empty entry for the clustering key
  • For each column in a row there is a single entry for each clustering key column

让我们举个例子:

CREATE TABLE my_table (
    pk1 int,
    pk2 int,
    ck1 int,
    ck2 int,
    d1 int,
    d2 int,
    s int static,
    PRIMARY KEY ((pk1, pk2), ck1, ck2)
); 

插入一些虚拟数据:

 pk1 | pk2 | ck1 | ck2  | s     | d1     | d2
-----+-----+-----+------+-------+--------+---------
   1 |  10 | 100 | 1000 | 10000 | 100000 | 1000000
   1 |  10 | 100 | 1001 | 10000 | 100001 | 1000001
   2 |  20 | 200 | 2000 | 20000 | 200000 | 2000001

内部结构将是:

             |100:1000:  |100:1000:d1|100:1000:d2|100:1001:  |100:1001:d1|100:1001:d2|  
-----+-------+-----------+-----------+-----------+-----------+-----------+-----------+
1:10 | 10000 |           |  100000   |  1000000  |           |  100001   |  1000001  |


             |200:2000:  |200:2000:d1|200:2000:d2|
-----+-------+-----------+-----------+-----------+ 
2:20 | 20000 |           |  200000   |  2000000  |

因此表的大小为:

Single Partition Size = (4 + 4 + 4 + 4) + 4 + 2 * ((4 + (4 + 4)) + (4 + (4 + 4))) byte = 68 byte

Estimated Table Size = Single Partition Size * Number Of Partition 
                     = 68 * 2 byte
                     = 136 byte



查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆