了解Cassandra中的令牌函数 [英] Understanding the Token Function in Cassandra

查看:198
本文介绍了了解Cassandra中的令牌函数的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

您好,我正在阅读关于令牌函数的Cassandra文档,

Hello I was reading the Cassandra documentation on Token Function,

我试图为Cassandra表实现分页,我无法理解突出显示的行。该文件说到k> 42和TOKEN(k)> TOKEN(42)之间的区别,但我不能理解基于标记的比较。

I am trying to achieve pagination for a Cassandra table, I am unable to understand the lines highlighted. The document speaks about the difference between k > 42 and TOKEN(k) > TOKEN(42), but I am not able to understand the "token based comparison"

向前以详细说明当WHERE子句的一部分时,什么是令牌函数。

Looking forward for a detailed explanation of what token function does when part of a WHERE clause.

推荐答案

> partition 它应该放你的数据,C *在每行的 PARTITION KEY 上做一些计算。具体来说,在每个节点上,按分区器生成的令牌对行进行排序(并且每个分区具有按照集群键排序的数据)。不同的 分区器 执行不同类型的计算。

In order to understand in which partition it should put your data, C* makes some calculations on the PARTITION KEYs of every row. Specifically, on each node, rows are sorted by the token generated by the partitioner, (and each partition have data sorted by the cluster key). Different partitioners perform different types of calculations.

Murmur3Partitioner 计算partion键的 MurmurHash ByteOrderedPartitioner 使用分区键本身的原始数据字节:当您使用Murmur3Partitioner时,您的行按照 hashes 排序,而当您使用 ByteOrderedPartitioner 时,您的行会直接 em>原始值。

While the Murmur3Partitioner calculates the MurmurHash of the partion key, the ByteOrderedPartitioner uses the raw data bytes of the partition key itself: when you use the Murmur3Partitioner, your rows are sorted by their hashes, while when you use the ByteOrderedPartitioner, your rows are sorted directly by their raw values.

例如,假设您有一个这样的表:

As an example, assume you have a table like this:

CREATE TABLE test (
    username text,
    ...
    PRIMARY KEY (username)
);

假设您要查找与用户名对应的行存储abcd abce abcf 。这些字符串的十六进制表示形式是 0x61626364 0x61626365 0x61626366 。假设我们在这两个字符串上应用此 MH3 实现(为简单起见,x86,32位),我们获得 0x43ED676A 0xE297E8AA 0x87E62668 。因此,在MH3的情况下,字符串的令牌将是这3个值,而在BOP的情况下,令牌将是原始数据值本身: 0x61626364 0x61626365 0x61626366

And assume you're trying to locate where the rows corresponding to the usernames abcd and abce and abcf are stored. The hex representation of these strings are 0x61626364 and 0x61626365 and 0x61626366 respectively. Assuming we apply this MH3 implementation (x86, 32-bit for simplicity, no optional seed) on both strings we get ‭0x‭43ED676A‬‬ and 0x‭‭E297E8AA‬‬ and 0x‭‭87E62668‬‬ respectively. So, in the case of MH3, the tokens of the strings will be these 3 values, while in the case of the BOP the tokens will be the raw data values themselves: 0x61626364, 0x61626365 and 0x61626366.

现在,您可以看到,使用不同分区器时,按令牌排序的存储数据会产生不同的结果。 SELECT * FROM test; 查询将返回不同顺序的行。如果您已根据原始值对数据进行排序,您需要在同一个网页中检索此 订单,因为当您使用MH3时,订单是与您的数据无关的完整性。

Now you can see that storing data sorted by token produces different results when different partitioners are used. A SELECT * FROM test; query would return rows in different order. This can (but should not) be a problem if you have data already sorted by their raw values and you need to retrieve that in the same order because when you use MH3 the order is complelety unrelated to your data.

回到问题, TOKEN 函数允许您直接通过 ,而不是您的数据文档说:

Back to the question, the TOKEN function allows you to filter directly by the tokens of your data instead of your data. The documentation says:


使用TOKEN函数排序并不总是提供预期的
结果。使用TOKEN函数在
分区键列上表达条件关系。在这种情况下,查询将返回基于
分区键的标记而不是该值的行。

ordering with the TOKEN function does not always provide the expected results. Use the TOKEN function to express a conditional relation on a partition key column. In this case, the query returns rows based on the token of the partition key rather than on the value.

例如,您可以发出:

SELECT * FROM test WHERE TOKEN(username) <= TOKEN('abcf');

你会得到什么? abcd acbf rows!这是因为有时候有时候重要...就像在你想要做的分页的情况下,会被任何可用的C *驱动程序处理为无瑕疵 (例如 Java驱动程序)。

and you'd get figure what? abcd and acbf rows!!! This is because order sometimes matters... Like in the case of the pagination you're trying to do, which will be handled flawlessy for you by any available C* driver (eg the Java driver).

也就是说,新集群的推荐分区器是 Murmur3Partitioner ,您可以查看文档每个分区器的优点和缺点。请注意,分区器是群集范围设置,一旦设置,您就无法更改它,而无法将所有数据推送到另一个群集。

That said, the recommended partitioner for new clusters is Murmur3Partitioner, you can check the documentation for both pros and cons of each partitioner. Please note that the partitioner is a cluster-wide settings, and once set you cannot change it without pushing all of your data into another cluster.

仔细选择。

这篇关于了解Cassandra中的令牌函数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆