卡桑德拉的高和低基数 [英] high and low cardinality in Cassandra

查看:92
本文介绍了卡桑德拉的高和低基数的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我一直遇到以下术语: Cassandra <高基数低基数 / code>。



我不明白它们的确切含义。它们对查询有什么影响以及首选什么。
请举例说明,因为这很容易理解。

解决方案

X的基数不过是组成X的元素数。
在Cassandra中,分区键基数对于分区数据非常重要。



由于分区键负责分配

假设您有一个 20 存储评论的节点-RF为 2 。每个评论都有其自己的投票,投票从1到5。现在,由于您想轻松地通过投票检索评论,因此您可能会倾向于选择投票作为分区键。

 创建表注释(投票int,内容文本,id uuid,主键(vote,id)); 

在这种情况下,唯一负责数据分发的关键是投票,因为投票率很低只能包含5个值(1,2,3,4,5)。这意味着,在最佳情况下,5个不同分区的所有者将是5个不同分区的所有者(具有投票1的所有评论 ...具有投票5的所有评论),以及再次保持最佳状态,RF为2,10个不同的节点将保存您的数据。如您所见,您有20个节点的集群,在最佳情况下使用率不会超过50%。



数据分配非常重要,这就是分区的原因关键基数很重要



HTH,
Carlo


I keep coming across these terms: high cardinality and low cardinality in Cassandra.

I don't understand what exactly they mean. what effects they have on queries and what is preferred. Please explain with example since that will be be easy to follow.

解决方案

The cardinality of X is nothing more than the number of elements that compose X. In Cassandra the partition key cardinality is very important for partitioning data.

Since the partition key is responsible for the distribution of the data across the cluster, choosing a low cardinality key might lead to a situation in which your data are not distributed.

Imagine you have a cluster of 20 nodes storing comments -- the RF is 2. Each comment has it's own vote going from 1 to 5. Now, since you want to easily retrieve comments by votes, you might be tempted to choose vote as partition key.

CREATE TABLE comments(vote int, content text, id uuid, PRIMARY KEY(vote, id));

In this situation the only key responsible for data distribution is vote, which has a very low cardinality since it can contains only 5 values (1,2,3,4,5). This means that, in the best situation 5 different nodes will be the owners of the 5 different partitions (which are "all comments with vote 1" ... "all comments with vote 5"), and again in best situation, with a RF of 2, 10 different nodes will hold your data. As you can see you have a 20 nodes cluster which isn't used more than 50% in best situation.

Data distribution is very important, that's why partition key cardinality matters a lot

HTH, Carlo

这篇关于卡桑德拉的高和低基数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆