Cassandra vnodes副本 [英] Cassandra vnodes replicas

查看：100 发布时间：2020/9/29 21:10:52 cassandra

本文介绍了Cassandra vnodes副本的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

设置上下文：
Cassandra当前实现vnode。默认为256，可在cassandra.yaml文件
中进行调整。据我了解，Vnodes是令牌范围/哈希范围。例如。 [x ... y]，其中y是vnode的令牌号。Cassandra中的每个物理节点都被分配了随机的256个令牌，并且这些令牌中的每一个都是哈希/令牌范围的边界值。

问题：$ 2 ^ -63到2 ^ 63-1的范围（murmur3具有分区程序的哈希值的范围可能会生成）。 b $ b 1.令牌范围（vnode）是否为固定范围，一旦设置，该令牌范围将被复制到其他Cassandra节点，以满足复制因子，例如令牌范围（vnode）是基本数据块（令牌）在一起。只有在群集中引导新节点时，此令牌范围（vnode）可能会破裂以分配给其他节点。

< ol start = 2>

骑在最后一个命题上（例如，最后一个命题为真）
然后，vnode必须仅包含属于给定键空间的令牌。因为每个键空间（列族/表的容器）都有一个定义d复制策略和复制因子。而且，Cassandra集群中键空间的复制因子很有可能会发生变化。
考虑一个例子。 system_schema键空间的RF为1，而我使用RF 3创建了键空间 test_ks。如果system_schema键空间的行具有标记号2（例如），而test_ks的行具有标记号5（例如）。
这两个令牌不能放置在相同的令牌范围内（vnode）。如果vnode是令牌范围的一致块，则说令牌2和5属于令牌编号为10的vnode。因此，必须将vnode 10放置在3个不同的物理节点上，以满足test_ks的RF = 3，但是我们不必要放置令牌在3个不同的节点上，它们的RF假定为1。

这个主张正确吗，一个vnode只专用于给定键空间？
分解为一个物理节点上的256个令牌...... 20个（例如）vnode当前属于系统键空间，80个（例如）vnode属于test_ks。

再次基于上述命题，这意味着每个节点都应具有集群中当前可用的键空间级vnode的信息。
这样，当对密钥空间进行新写入时，协调器节点将为该密钥空间定位集群中的所有vnode，并为新行分配一个令牌号，该令牌号在这些密钥空间的令牌范围内。这样的话，我可以知道整个集群/或给定节点上当前有多少个vnode属于键空间。

请如果我错了，请纠正我。
我一直关注以下博客和视频以了解这一概念：

https://www.scribd.com/document/253239514/Virtual-Nodes-Strategies-for-Apache-Cassandra

https://www.youtube.com/watch?v=GddZ3pXiDys&t=11s

预先感谢

解决方案

没有固定的令牌范围，令牌只是随机生成的。这是实施vnode的原因之一-想法是，如果存在更多令牌，则生成的令牌范围更有可能在节点之间更均匀地分布。

最近在3.0中改进了令牌生成，使Cassandra可以更智能地放置新令牌（请参见 CASSANDRA-7032 ）。您还可以手动配置令牌（请参见 initial_token ），但是除非要计划将节点数量加倍，否则在扩展集群时保持平衡非常困难。

a中令牌的总数cluster是群集中的节点数乘以每个节点的vnode数。

关于副本的放置，分区的第一个副本位于拥有该分区令牌的节点。附加的 n 副本顺序放置在同一数据中心内环中的下一个 n 节点上。令牌和键空间之间没有关系。

当新写入协调器节点时，协调器节点通过对分区键进行哈希处理来确定哪个节点拥有该分区。请注意，为获得更好的性能，实际上可以由驱动程序代替，如果您使用 TokenAwarePolicy 。协调器将写操作发送到拥有该分区的节点，如果需要复制数据，则协调器节点还将副本复制到令牌空间中的下一个两个节点中。

例如，假设我们有3个节点，每个节点都有一个令牌： node1：10 ， node2：20 & node3：30 。如果我们将记录的分区键哈希为 22 的记录写入具有RF3的键空间，则第一个副本将转到node2，第二个副本将转到node3，第三个副本将转到node1 。请注意，每个副本都是同等有效的-除了第一副本恰好存储在第一副本节点上之外，没有什么特别的。

Vnode不会更改此过程，它们只是通过允许每个节点具有多个令牌来拆分每个节点的令牌范围。例如，如果我们的集群现在每个节点都有2个vnode，则它可能看起来像这样： node1：10，25 ， node2：20， 3 & node3：30，21 。现在我们写成散列到 22 的内容转到 node3 （因为它拥有范围 21 -24 ），然后将副本复制到 node1 和 node2 。

Setting up the context: Cassandra currently implements vnodes. 256 by default which is tweakable in the cassandra.yaml file Vnodes as I understand are token-ranges/hash-ranges. Eg. (x...y], where y is the token number of the vnode. Each physical node in Cassandra is assigned random 256 tokens, and each of those tokens are the boundary value of a hash/token range. The tokens assigned are within the range of 2^-63 to 2^63-1 (the range of hash numbers which murmur3 has partitioner may generate). So far so good.

Question: 1. Is it that a token range(vnode) is a fixed range. Once set, this token range will be copied to other Cassandra nodes to satisfy the replication factor like a token range(vnode) being a fundamental chunk of data(tokens) which goes around together. Only in case of bootstrap of a new node in the cluster, this token range(vnode) might break apart to be assigned to other node.

Riding on the last proposition, (say the last proposition is true). Then a vnode must only contain tokens which belong a given keyspace. Because each keyspace(container of column family/tables) has a defined replication strategy and replication factor. And it is highly likely that replication factor of keyspaces in a Cassandra cluster will vary. Consider an example. "system_schema" keyspace has a RF of 1 whereas I created a keyspace "test_ks" with RF 3. If a row of system_schema keyspace has a token number 2(say) and a row of my test_ks has token number 5(say). these 2 tokens can't be placed in the same token range(vnode). If a vnode is consistent chunk of token ranges, say token 2 and 5 belong to vnode with token number 10. so vnode 10 has to be placed on 3 different physical nodes to satisfy the RF =3 for test_ks, but we are unnecessary placing token 2 on 3 different nodes whose RF is supposed to be 1.

Is this proposition correct that, a vnode is only dedicated to a given keyspace? which boils down to out of 256 tokens on a physical node... 20(say) vnodes currently belong to "system" keyspace, 80 vnodes(say) belong to test_ks.

Again riding on the above proposition, this means that each node should have the info of keyspace-wise vnodes currently available in the cluster. That way when a new write comes in for a Keyspace the co-ordinator node would locate all vnodes in the cluster for that keyspace and assign the new row a token number which falls within the token range of those keyspaces. That being the case can I know how many vnodes currently belong to a keyspace in the entire cluster/ or on a given node.

Please do correct me if I'm wrong. I have been following the below blogs and videos to get an understanding of this concept:

https://www.scribd.com/document/253239514/Virtual-Nodes-Strategies-for-Apache-Cassandra

https://www.youtube.com/watch?v=GddZ3pXiDys&t=11s

Thanks in advance

解决方案

There is no fixed token-range, the tokens are just generated randomly. This is one of the reasons that vnodes were implemented - the idea being that if there are more tokens it is more likely that the resulting token-ranges will be more evenly distributed across nodes.

Token generation was recently improved in 3.0, allowing Cassandra to place new tokens a little more intelligently (see CASSANDRA-7032). You can also manually configure tokens (see initial_token), although it can become tricky to keep things balanced when it comes time to expand the cluster unless you plan on doubling the number of nodes.

The total number of tokens in a cluster is the number of nodes in the cluster multiplied by the number of vnodes per node.

In regards to placement of replicas, the first copy of a partition is placed in the node that owns that partition's token. The additional n copies are placed sequentially on the next n nodes in the ring that are in the same data centre. There is no relationship between tokens and keyspaces.

When a new write comes into a coordinator node, the coordinator node determines which node owns the partition by hashing the partition key. Note that for better performance this can actually be done by the driver instead if you use TokenAwarePolicy. The coordinator sends the write to the node that owns the partition, and if the data needs to be replicated the coordinator node also writes the replicas to the next two nodes sequentially in the token-space.

For example, suppose that we have 3 nodes which each have one token: node1: 10, node2: 20 & node3: 30. If we write a record whose partition key hashes to 22, to a keyspace with RF3, then the first copy goes to node2, the second goes to node3 and the third goes to node1. Note that each replica is equally valid - there is nothing special about the "first" replica other than that it happens to be stored on the "first" replica node.

Vnodes do not change this process, they just split up each node's token ranges by allowing each node to have more than one token. For example, if our cluster now has 2 vnodes for each node, it might instead look like this: node1: 10, 25, node2: 20, 3 & node3: 30, 21. Now our write that hashed to 22 goes to node3 (because it owns the range from 21-24), and the copies go to node1 and node2.

这篇关于Cassandra vnodes副本的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

Cassandra vnodes副本 [英] Cassandra vnodes replicas

问题描述

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

Cassandra vnodes副本 [英] Cassandra vnodes replicas

问题描述

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭