理解 Cassandra 中的令牌功能 [英] Understanding the Token Function in Cassandra

查看:16
本文介绍了理解 Cassandra 中的令牌功能的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

您好,我正在阅读有关令牌功能的 Cassandra 文档,

Hello I was reading the Cassandra documentation on Token Function,

我正在尝试为 Cassandra 表实现分页,但我无法理解突出显示的行.该文档谈到了 k > 42 和 TOKEN(k) > TOKEN(42) 之间的区别,但我无法理解基于令牌的比较"

I am trying to achieve pagination for a Cassandra table, I am unable to understand the lines highlighted. The document speaks about the difference between k > 42 and TOKEN(k) > TOKEN(42), but I am not able to understand the "token based comparison"

期待详细解释令牌函数作为 WHERE 子句的一部分时的作用.

Looking forward for a detailed explanation of what token function does when part of a WHERE clause.

推荐答案

为了了解应该将数据放在哪个分区中,C* 对 PARTITION KEY<进行了一些计算/code>s 的每一行.具体来说,在每个节点上,行按分区器生成的令牌排序,(每个分区都有按集群键排序的数据).不同的分区执行不同类型的计算.

In order to understand in which partition it should put your data, C* makes some calculations on the PARTITION KEYs of every row. Specifically, on each node, rows are sorted by the token generated by the partitioner, (and each partition have data sorted by the cluster key). Different partitioners perform different types of calculations.

虽然 Murmur3Partitioner 计算 <分区键的 href="https://en.wikipedia.org/wiki/MurmurHash" rel="noreferrer">MurmurHash,ByteOrderedPartitioner 使用分区键本身的原始数据字节:当您使用 Murmur3Partitioner 时,您的行按其排序哈希,而当您使用 ByteOrderedPartitioner 时,您的行将直接按它们的原始值排序.

While the Murmur3Partitioner calculates the MurmurHash of the partion key, the ByteOrderedPartitioner uses the raw data bytes of the partition key itself: when you use the Murmur3Partitioner, your rows are sorted by their hashes, while when you use the ByteOrderedPartitioner, your rows are sorted directly by their raw values.

举个例子,假设你有一个这样的表:

As an example, assume you have a table like this:

CREATE TABLE test (
    username text,
    ...
    PRIMARY KEY (username)
);

并假设您正在尝试定位与用户名 abcdabceabcf 对应的行的存储位置.这些字符串的十六进制表示分别是 0x616263640x616263650x61626366.假设我们在两个字符串上应用这个 MH3 实现(x86,为简单起见,32 位,没有可选的种子),我们得到 0x43ED676A0xE297E8AA0x87E62668 分别.因此,在 MH3 的情况下,字符串的标记将是这 3 个值,而在 BOP 的情况下,标记将是原始数据值本身:0x61626364, 0x616263650x61626366.

And assume you're trying to locate where the rows corresponding to the usernames abcd and abce and abcf are stored. The hex representation of these strings are 0x61626364 and 0x61626365 and 0x61626366 respectively. Assuming we apply this MH3 implementation (x86, 32-bit for simplicity, no optional seed) on both strings we get ‭0x‭43ED676A‬‬ and 0x‭‭E297E8AA‬‬ and 0x‭‭87E62668‬‬ respectively. So, in the case of MH3, the tokens of the strings will be these 3 values, while in the case of the BOP the tokens will be the raw data values themselves: 0x61626364, 0x61626365 and 0x61626366.

现在您可以看到,当使用不同的分区器时,存储按 token 排序的数据会产生不同的结果.SELECT * FROM test; 查询将以不同的顺序返回行.如果您的数据已经按原始值排序并且您需要在相同的订单,因为当您使用 MH3 时,订单与您的数据完全无关.

Now you can see that storing data sorted by token produces different results when different partitioners are used. A SELECT * FROM test; query would return rows in different order. This can (but should not) be a problem if you have data already sorted by their raw values and you need to retrieve that in the same order because when you use MH3 the order is complelety unrelated to your data.

回到问题,TOKEN 函数允许您直接通过数据的标记而不是您的数据进行过滤.文档 说:

Back to the question, the TOKEN function allows you to filter directly by the tokens of your data instead of your data. The documentation says:

使用 TOKEN 功能排序并不总是提供预期的结果.使用 TOKEN 函数表达一个条件关系分区键列.在这种情况下,查询返回基于分区键的标记而不是值.

ordering with the TOKEN function does not always provide the expected results. Use the TOKEN function to express a conditional relation on a partition key column. In this case, the query returns rows based on the token of the partition key rather than on the value.

例如,您可以发出:

SELECT * FROM test WHERE TOKEN(username) <= TOKEN('abcf');

你会明白什么?abcdacbf 行!!!这是因为顺序有时很重要...就像您尝试进行分页的情况一样,任何可用的 C* 驱动程序都会为您完美处理(例如 Java 驱动程序).

and you'd get figure what? abcd and acbf rows!!! This is because order sometimes matters... Like in the case of the pagination you're trying to do, which will be handled flawlessy for you by any available C* driver (eg the Java driver).

也就是说,新集群的推荐分区器是 Murmur3Partitioner,您可以查看文档 了解每个分区器的优缺点.请注意,分区器是一个集群范围设置,一旦设置,除非将所有数据推送到另一个集群,否则您无法更改它.

That said, the recommended partitioner for new clusters is Murmur3Partitioner, you can check the documentation for both pros and cons of each partitioner. Please note that the partitioner is a cluster-wide settings, and once set you cannot change it without pushing all of your data into another cluster.

慎重选择.

这篇关于理解 Cassandra 中的令牌功能的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆