Cassandra每个分区有20亿个单元的限制,但是什么是分区? [英] Cassandra has a limit of 2 billion cells per partition, but what's a partition?

查看:601
本文介绍了Cassandra每个分区有20亿个单元的限制,但是什么是分区?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在Cassandra Wiki中,据说每个分区有 20亿个单元格(行x列)的限制。但是我不清楚是什么是分区?

In Cassandra Wiki, it is said that there is a limit of 2 billion cells (rows x columns) per partition. But it is unclear to me what is a partition?

我们每个节点每列有一个分区,这意味着列族的最大大小将是 20亿个单元格*节点数

Do we have one partition per node per column family, which would mean that the max size of a column family would be 2 billion cells * number of nodes in the cluster.

或者Cassandra会创建所需的分区列族的所有数据?

Or will Cassandra create as much partitions as required to store all the data of a column family?

我开始一个新项目,所以我将使用Cassandra 2.0。

I am starting a new project so I will use Cassandra 2.0.

推荐答案

随着CQL3的出现,术语与旧的thrift术语略有不同。

With the advent of CQL3 the terminology has changed slightly from the old thrift terms.

基本上

Create Table foo (a int , b int, c int, d int, PRIMARY KEY ((a,b),c))

将创建一个CQL3表。 a和b中的信息用于创建分区键,这将描述信息将驻留在哪个节点。这是在20亿个单元格限制中讨论的分区。

Will make a CQL3 table. The information in a and b is used to make the partition key, this describes which node the information will reside on. This is the 'partiton' talked about in the 2 billion cell limit.

在该分区内,信息将由c组织,称为聚类键。 a,b和c一起定义d的唯一值。在这种情况下,分区中的单元的数量将是c * d。因此,在这个例子中,对于任何给定的a和b,只有c和d的2亿个组合

Within that partition the information will be organized by c, known as the clustering key. Together a,b and c, define a unique value of d. In this case the number of cells in a partition would be c * d. So in this example for any given pair of a and b there can only be 2 billion combinations of c and d

因此,当你建模数据时,主键将变化,以便您的数据将随机分布关于Cassandra。

So as you model your data you want to ensure that the primary key will vary so that your data will be randomly distributed about Cassandra. Then use clustering keys to ensure that your data is available in the way you want it.

观看此视频,了解有关Cassandra中Datmodeling的更多信息
数据模型已死,数据模型已长期存在

Watch this video for more info on Datmodeling in cassandra The Datamodel is Dead, Long live the datamodel

Create Table foo (a int , b int, c int, d int, e int, f int, PRIMARY KEY ((a,b),c,d))

a和b的组合。

在分区c和d中将用于对分区中的单元格进行排序,因此布局将
看起来有点像:

Within a partition c and d will be used to order cells within the partition so the layout will look a little like:

(a1,b1) --> [c1,d1 : e1], [c1,d1  :f1], [c1,d2 : e2] ....  


b $ b

在这个例子中,你可以拥有20亿个单元格,每个单元格包含:

So in this example you can have 2 Billion cells with each cell containing:


  • 值为c

  • 值d

  • e或f的值

因此,20亿的限制是指(c,d,e)(c,d,f)的唯一元组的总和,

So the 2 billion limit refers to the sum of unique tuples of (c,d,e) and (c,d,f).

这篇关于Cassandra每个分区有20亿个单元的限制,但是什么是分区?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆