Cassandra 中的聚类键 [英] Clustering Keys in Cassandra

查看:13
本文介绍了Cassandra 中的聚类键的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在给定的物理节点上,给定分区键的行按照集群键的顺序存储,这使得按照该集群顺序检索行特别有效.http://cassandra.apache.org/doc/cql3/CQL.html#createTableStmt 什么排序是由聚类键引起的吗?

On a given physical node, rows for a given partition key are stored in the order induced by the clustering keys, making the retrieval of rows in that clustering order particularly efficient. http://cassandra.apache.org/doc/cql3/CQL.html#createTableStmt What kind of ordering is induced by clustering keys?

推荐答案

假设你的集群键是

k1 t1, k2 t2, ..., kn tn

其中ki是第i个键名,ti是第i个键类型.然后将订单数据存储在字典序中,其中每个维度都使用该类型的比较器进行比较.

where ki is the ith key name and ti is the ith key type. Then the order data is stored in is lexicographic ordering where each dimension is compared using the comparator for that type.

所以 (a1, a2, ..., an) <(b1, b2, ..., bn) 如果 a1

So (a1, a2, ..., an) < (b1, b2, ..., bn) if a1 < b1 using t1 comparator, or a1=b1 and a2 < b2 using t2 comparator, or (a1=b1 and a2=b2) and a3 < b3 using t3 comparator, etc..

这意味着查找具有特定 k1=a 的所有行是有效的,因为数据存储在一起.但是查找所有 ki=x for i > 1 的行是低效的.事实上,这样的查询是不允许的 - 唯一允许的集群键约束指定零个或多个集群键,从第一个开始,没有丢失.

This means that it is efficient to find all rows with a certain k1=a, since the data is stored together. But it is inefficient to find all rows with ki=x for i > 1. In fact, such a query isn't allowed - the only clustering key constraints that are allowed specify zero or more clustering keys, starting from the first with none missing.

例如,考虑架构

create table clustering (
    x text,
    k1 text,
    k2 int,
    k3 timestamp,
    y text,
    primary key (x, k1, k2, k3)
);

如果你做了以下插入:

insert into clustering (x, k1, k2, k3, y) values ('x', 'a', 1, '2013-09-10 14:00+0000', '1');
insert into clustering (x, k1, k2, k3, y) values ('x', 'b', 1, '2013-09-10 13:00+0000', '1');
insert into clustering (x, k1, k2, k3, y) values ('x', 'a', 2, '2013-09-10 13:00+0000', '1');
insert into clustering (x, k1, k2, k3, y) values ('x', 'b', 1, '2013-09-10 14:00+0000', '1');

然后它们按此顺序存储在磁盘上(顺序 select * from clustering where x = 'x' 返回):

then they are stored in this order on disk (the order select * from clustering where x = 'x' returns):

 x | k1 | k2 | k3                       | y
---+----+----+--------------------------+---
 x |  a |  1 | 2013-09-10 14:00:00+0000 | 1
 x |  a |  2 | 2013-09-10 13:00:00+0000 | 1
 x |  b |  1 | 2013-09-10 13:00:00+0000 | 1
 x |  b |  1 | 2013-09-10 14:00:00+0000 | 1

k1 排序占优势,然后是 k2,然后是 k3.

k1 ordering dominates, then k2, then k3.

这篇关于Cassandra 中的聚类键的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆