卡桑德拉分片和复制 [英] cassandra sharding and replication

查看:91
本文介绍了卡桑德拉分片和复制的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我是Cassandra的新手,正在通过这篇文章解释分片和复制,我陷入了一个困境-

I am new to Cassandra was going though this Article explaining sharding and replication and I am stuck at a point that is -

我有一个在本地计算机上配置了6个Cassandra节点的集群。我创建一个新的键空间 TestKeySpace,其复制因子为6,并在键空间 employee中创建一个表,主键为名为RID的自动递增编号。
我无法理解如何对这些数据进行分区和复制。我想知道的是,由于我将复制因子保持为6,并且数据将分布在多个节点上,那么每个节点是否将具有与其他节点完全相同的数据?

I have a cluster with 6 Cassandra nodes configured at my local machine. I create a new keyspace "TestKeySpace" with replication factor as 6 and a table in keyspace "employee" and primary key is auto-increment-number named RID. I am not able to understand how this data will be partitioned and replicated. What I want to know is since I am keeping my replication factor to be 6, and data will be distributed on multiple nodes, then will each node will be having exactly same data as the other nodes or not?

如果我的集群具有以下配置-

What If my cluster has following configuration -

    Number of nodes - 6 (n1, n2 ,n3, n4, n5 and n6).
    replication_factor - 3. 

我如何确定任何一个节点的大小(假设为n1 ),在其他两个节点上复制数据,以及哪些其他节点表现为不同的分片。

How can I determine that for any one node (let say n1), on which other two nodes the data is replicated and which other nodes are behaving as different shards.

在此先感谢。

致谢,
Vibhav

Regards, Vibhav

PS-如果有人对此票表示反对,请在评论中提及出了什么问题。

PS - If anybody down votes this question kindly do mention in comments what went wrong.

推荐答案

我将通过一个简单的示例对此进行解释。
cassandra中的键空间等效于RDBMS中的数据库架构名称。

I will explain this with simple example. A keyspace in cassandra is equivalent to database schema name in RDBMS.

首先创建一个键空间-

CREATE KEYSPACE MYKEYSPACE WITH REPLICATION = { 
 'class' : 'SimpleStrategy', 
 'replication_factor' : 3 
};

让我们创建一个简单的表-

Lets create a simple table -

CREATE TABLE USER_BY_USERID(
 userid int,
 name text,
 email text,
 PRIMARY KEY(userid, name)
) WITH CLUSTERING ORDER BY(name  DESC);

在此示例中, userid 是您的分区键和名称是群集键。分区也称为行键,该键确定将在哪个节点行上保存。

In this example, userid is your partition key and name is clustering key. Partition is also called row key, this key determines on which node row will be saved.

您的第一个问题-

我无法理解如何对这些数据进行分区?

I am not able to understand how this data will be partitioned?

数据将根据您的分区键进行分区。默认情况下,C *使用 Murmur3partitioner 。您可以在cassandra.yaml配置文件中更改分区程序。分区的发生方式还取决于您的配置。您可以为每个节点指定令牌范围,例如,看看下面的cassandra.yaml配置文件。我已经从您的问题中指定了6个节点。

Data will be partitioned based on your partition key. By default C* uses Murmur3partitioner. You can change the partitioner in cassandra.yaml configuration file. How partitions happens, is also depends on your configuration. You can specify range of tokens for each node, for example take a look at below cassandra.yaml configuration file. I have specified 6 node form your question.

cassandra.yaml表示节点0:

cassandra.yaml for Node 0:

cluster_name: 'MyCluster'
initial_token: 0
seed_provider:
    - seeds:  "198.211.xxx.0"
listen_address: 198.211.xxx.0
rpc_address: 0.0.0.0
endpoint_snitch: RackInferringSnitch

cassandra.yaml用于节点1:

cassandra.yaml for Node 1:

cluster_name: 'MyCluster'
initial_token: 3074457345618258602
seed_provider:
    - seeds:  "198.211.xxx.0"
listen_address: 192.241.xxx.0
rpc_address: 0.0.0.0
endpoint_snitch: RackInferringSnitch

cassandra.yaml用于节点2:

cassandra.yaml for Node 2:

cluster_name: 'MyCluster'
initial_token: 6148914691236517205
seed_provider:
    - seeds:  "198.211.xxx.0"
listen_address: 37.139.xxx.0
rpc_address: 0.0.0.0
endpoint_snitch: RackInferringSnitch

.... Node3 ...... N ode4 ....

.......Node3 ...... Node4 ....

cassandra.yaml用于节点5:

cassandra.yaml for Node 5:

cluster_name: 'MyCluster'
initial_token: {some large number}
seed_provider:
    - seeds:  "198.211.xxx.0"
listen_address: 37.139.xxx.0
rpc_address: 0.0.0.0
endpoint_snitch: RackInferringSnitch

让此插入语句-

INSERT INTO USER_BY_USERID VALUES(
 1,
 "Darth Veder",
 "darthveder@star-wars.com"
);

Partitioner将计算PARTITION键的哈希值(在上面的示例中,userid-1),并确定哪个节点,此行将被保存。可以说计算得出的哈希值是12345,该行将保存在节点0(在上述配置中查找Node0的initial_token值)。

Partitioner will calculate the hash of the PARTITION key (in above example userid - 1), and decides which node this row will be saved. Lets say calculated hash is something 12345, this row will be saved at Node 0 (look for the initial_token value for Node0 in above configuration).

完成cassandra.yaml配置< a href = https://docs.datastax.com/zh-CN/cassandra/2.1/cassandra/configuration/configCassandra_yaml_r.html rel = noreferrer> configCassandra_yaml_r

Complete cassandra.yaml configuration configCassandra_yaml_r

您可以通过以下 deployCalcTokens 进行操作知道如何生成令牌。

You can go through this deployCalcTokens to know how to generate tokens.

第二个问题-


如何复制数据?

how data gets replicated?

根据您的复制策略和复制因子,数据将在每个节点上复制。您在创建键空间时必须指定复制因子和复制策略。
例如,在上面的示例中,我已使用 SimpleStrategy 作为复制策略。此策略适用于小型集群。对于地质分布的应用程序,您可以使用 NetworkTopologyStrategy 。 plication_factor指定要创建的行的副本数,在此示例中,将创建每行的三个副本。通过简单的策略,cassandra将使用顺时针方向复制该行。

Depending on your replication strategy and replication factor, the data gets replicated on each node. you have to specify Replication factor and replication strategy while creating keyspace. For example, in above example, I have used SimpleStrategy as replication strategy. This strategy is suitable for small cluster. For geologically distributed application you can use NetworkTopologyStrategy. replication_factor specifies, how many copies of a row to be created, in this example three copies of each row will be created. With simple strategy, cassandra will use clockwise direction to copy the row.

在上面的示例中,该行保存在Node0上,并且同一节点被复制到Node1和Node2上。
让我们再举一个例子-

In above example, the row is saved at Node0 and the same node gets copied on Node1 and Node2. Let's take another example -

INSERT INTO USER_BY_USERID VALUES(
 448454,
 "Obi wan kenobi",
 "obiwankenobi@star-wars.com"
);

对于用户ID 448454,计算出的哈希值为3074457345618258609,因此此行将保存在Node2(在上述配置中查找节点2的initial_token值),并按顺时针方向复制到Node3和Node4(请记住,我们已将复制因子指定为3,因此仅复制了Noe2,Node3和Node4)。

For user id 448454, the calculated hash is say 3074457345618258609, so this row will be save at Node2 (look for the initial_token value for node 2 in above configuration) and also get copied in clockwise direction to Node3 and Node4 (remember we have specified replication factor of 3, so only three copies Noe2, Node3, Node4).

希望这会有所帮助。

这篇关于卡桑德拉分片和复制的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆