不太清楚 Cassandra 的反模式 [英] Not quite clear about a Cassandra's anti-pattern

查看:19
本文介绍了不太清楚 Cassandra 的反模式的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

假设有一个具有以下结构的表:

Suppose,there is a table with the following structure:

create table cities (
  root text,
  name text,
  primary key(root,name)
) with clustering order by (name asc); -- for getting them sorted

insert into cities(root,name) values('.','Moscow');
insert into cities(root,name) values('.','Tokio');
insert into cities(root,name) values('.','London');

select * from cities where root='.'; -- get'em sorted asc

当为keyspace指定复制因子3并使用RandomPartitioner时,每行将有3个副本在3个节点上:主节点由行的哈希决定存储,2个下一个节点.为什么要有热点?从所有副本读取不是负载均衡的吗?

When specifying the replication factor of 3 for the keyspace and using RandomPartitioner,there will be 3 replicas of each row on 3 nodes: the main node determined for storing by the row's hash and 2 next ones. Why should there be a hotspot? Reading from all replicas is not load balanced?

推荐答案

定义这样的表,分区键是 rootname 是集群键.顾名思义,partition负责分区——分区是如何工作的?假设你有 4 个节点集群——我们有一个只生成 8 个键的哈希函数,(A,B,C,D,E,F,G,H)——这里是哈希在集群中的分布方式

Definining such a table the partition key is root while name is a clustering key. As the name suggest, partition is responsible for partitioning -- how partitioning work? Let's say you have 4 nodes cluster -- and we have an hash function that generates only 8 keys, (A,B,C,D,E,F,G,H) -- here is how hashes are distributed in the cluster

节点 1 - (A,B)
节点 2 - (C,D)
节点 3 - (E,F)
节点 4 - (G,H)

node 1 - (A,B)
node 2 - (C,D)
node 3 - (E,F)
node 4 - (G,H)

每个节点将使用以下 2 作为副本,因此节点 1 的副本为 (2,3),节点 2 的副本为 (3,4),节点 3 的副本为 (4,1),最后为节点 4 是 (1,2).

each node will use as replica's the following 2, so replica for node 1 are (2,3), replica for node 2 are (3,4), replica for node 3 are (4,1) and finally replica for node 4 are (1,2).

假设我们的函数hash(root),当根值为.返回属于节点1的B——节点1将存储信息,节点 (2,3) 将存储副本.节点 4 NEVER 涉及到 cities 表,它不会包含任何有关此表的数据(不属于概念的提示情况除外),因为修复分区键.在本例中,您使用了大约 75% 的集群,这看起来是可以接受的情况……假设您的应用程序在某一时刻受到影响,因为所涉及的 3 个节点无法处理读/写请求.现在,您可以向集群添加任意数量的节点,但使用此数据模型将无法进行水平扩展,因为没有其他节点将永远参与到城市表中.我认为在这种情况下解决您的问题的唯一方法是通过添加更多内存、更强大的 CPU 和 I/O 来增加这 3 个节点的能力(垂直扩展).创建不允许水平扩展的模式是一种反模式

Let's say our function hash(root), when root value is . returns B that belongs to node 1 -- node 1 will store the information and nodes (2,3) will store the replica. Node 4 is NEVER involved into cities table, it will not contain any data concerning this table (exception made for hints situations which are not part of the concept) because of the fix partition key. In this example you use about 75% of your cluster which may look like an acceptable situation ... let's say in one moment your application suffers because the 3 nodes involved are not capable of handling read/write requests. Now you can add as many nodes as you want to the cluster but using this data model you won't be able to scale horizontally, because NO OTHER NODE WILL EVER BE INVOLVED INTO cities TABLE. The only way I see to solve your problem in such a situation is to increment power of these 3 nodes (vertical scaling) by adding more memory, more powerful cpu and I/O. Creating a schema that does not allow horizontal scaling is an anti pattern

这篇关于不太清楚 Cassandra 的反模式的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆