不太清楚卡桑德拉的反模式 [英] Not quite clear about a Cassandra's anti-pattern

查看:126
本文介绍了不太清楚卡桑德拉的反模式的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

假设有一个具有以下结构的表:

 创建表城市(
根文本,
名称文本,
主键(根,名称)
)与聚簇顺序(name asc); - 为它们排序

插入到城市(根,名称)值('。','Moscow');
插入到城市(根,名称)值('。','Tokio');
插入到城市(根,名称)值('。','London');

select *从root ='。'的城市 - get'em sorted asc

当为keyspace指定复制因子3并使用RandomPartitioner时,在3个节点上将存在每行的3个副本:主节点被确定为由行的哈希存储并且2个下一个节点。为什么要有热点?

解决方案

定义这样一个表的分区键是 root name 是一个聚类键。
顾名思义,分区负责分区 - 分区如何工作?假设你有4个节点cluster - 我们有一个散列函数只生成8个键(A,B,C,D,E,F,G,H) - 这里是散列是如何分布的



节点1 - (A,B)

节点2 - (C,D)

节点3 - (E,F)

节点4 - (G,H)



每个节点将作为副本使用以下2节点1的副本是(2,3),节点2的副本是(3,4),节点3的副本是(4,1),最后对于节点4的副本是(1,2)。 b
$ b

当根值时,我们的函数 hash(root)返回属于节点1的 B - 节点1将存储信息,节点(2,3)将存储副本。节点4 从不涉及到城市表,它不会包含任何有关此表的数据(例外情况,不是概念)因为修复分区键。在这个例子中,你使用大约75%的集群,这可能看起来像一个可接受的情况...让我们说,在一个时刻,你的应用程序遭受,因为涉及的3个节点不能处理读/写请求。现在,您可以根据需要向群集添加任意数量的节点,但使用此数据模型将无法横向扩展,因为其他节点不会加入城市表格。在这种情况下,我看到解决您的问题的唯一方法是通过增加更多的内存,更强大的CPU和I / O来增加这3个节点的功率(垂直缩放)。创建不允许水平缩放的模式是反模式


Suppose,there is a table with the following structure:

create table cities (
  root text,
  name text,
  primary key(root,name)
) with clustering order by (name asc); -- for getting them sorted

insert into cities(root,name) values('.','Moscow');
insert into cities(root,name) values('.','Tokio');
insert into cities(root,name) values('.','London');

select * from cities where root='.'; -- get'em sorted asc

When specifying the replication factor of 3 for the keyspace and using RandomPartitioner,there will be 3 replicas of each row on 3 nodes: the main node determined for storing by the row's hash and 2 next ones. Why should there be a hotspot? Reading from all replicas is not load balanced?

解决方案

Definining such a table the partition key is root while name is a clustering key. As the name suggest, partition is responsible for partitioning -- how partitioning work? Let's say you have 4 nodes cluster -- and we have an hash function that generates only 8 keys, (A,B,C,D,E,F,G,H) -- here is how hashes are distributed in the cluster

node 1 - (A,B)
node 2 - (C,D)
node 3 - (E,F)
node 4 - (G,H)

each node will use as replica's the following 2, so replica for node 1 are (2,3), replica for node 2 are (3,4), replica for node 3 are (4,1) and finally replica for node 4 are (1,2).

Let's say our function hash(root), when root value is . returns B that belongs to node 1 -- node 1 will store the information and nodes (2,3) will store the replica. Node 4 is NEVER involved into cities table, it will not contain any data concerning this table (exception made for hints situations which are not part of the concept) because of the fix partition key. In this example you use about 75% of your cluster which may look like an acceptable situation ... let's say in one moment your application suffers because the 3 nodes involved are not capable of handling read/write requests. Now you can add as many nodes as you want to the cluster but using this data model you won't be able to scale horizontally, because NO OTHER NODE WILL EVER BE INVOLVED INTO cities TABLE. The only way I see to solve your problem in such a situation is to increment power of these 3 nodes (vertical scaling) by adding more memory, more powerful cpu and I/O. Creating a schema that does not allow horizontal scaling is an anti pattern

这篇关于不太清楚卡桑德拉的反模式的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆