order by子句在Cassandra查询中不起作用 [英] order by clause not working in Cassandra query

查看:131
本文介绍了order by子句在Cassandra查询中不起作用的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我使用以下代码创建了表层:

  CREATE TABLE层(
layer_name文本,
layer_position文本,
主键(layer_name,layer_position)
),带有排序顺序(layer_position DESC)

我使用以下查询以降序从图层表中获取数据(图层):

  $ select = new Cassandra\SimpleStatement(<<< EOD 
select * from layer ORDER BY layer_position DESC
EOD
);

$ result = $ session-> execute($ select);

但是此查询无效。

解决方案

简单地说,Cassandra仅在分区键内强制执行

 主键(layer_name,layer_position)
),具有簇排序依据(layer_position DESC)

在这种情况下, layer_name 是您的分区键。如果在WHERE子句中指定 layer_name ,则该值 layer_name 的结果将按<$ c $排序。 c> layer_position 。

  SELECT * FROM layer WHERE layer_name ='layer1'; 

您无需指定ORDER BY。在查询级别,ORDER BY真正能做的就是应用不同的排序方向(升序还是降序)。



Cassandra以这种方式工作,因为它旨在读取数据它以任何顺序在磁盘上排序。您的分区键按哈希令牌值排序,这就是为什么未绑定WHERE子句的结果似乎是随机排序的。



EDIT


我必须使用 state_id 列来获取数据,并且应该按<$ c进行排序$ c> layer_position 。


Cassandra表针对特定查询进行了优化。虽然这导致高性能,但缺点是查询灵活性受到限制。解决此问题的方法是将数据复制到旨在服务于该特定查询的附加表中。

 创建表layer_by_state_id (
layer_name文本,
layer_position文本,
state_id文本,
PRIMARY KEY(state_id,layer_position,layer_name)
)聚类排序依据(layer_position DESC,layer_name ASC );

此表将允许以下查询工作:

  SELECT * FROM图层WHERE state_id ='thx1138'; 

结果将按 layer_position 排序,在请求的 state_id 中。



现在,我做出了两个假设,您需要调查: / p>


  • 我假设 state_id 是一个很好的分区键。这意味着它具有足够高的基数,可以在群集中提供良好的分布,但是具有足够低的基数,它可以返回足够的CQL行以进行排序。

  • 我假设 state_id layer_position 不足以唯一地标识每一行。因此,我通过添加 layer_name 作为附加的群集密钥来确保唯一性。您可能需要也可能不需要,但是我猜测您会这样做。

  • 我假设使用 state_id 作为分区密钥将不会显示出无限的增长,从而接近Cassandra的每个分区20亿个单元的限制。在这种情况下,您可能需要添加一个额外的分区 bucket。


I have created a table layer using following code:

CREATE TABLE layer (
    layer_name text,
    layer_position text,
    PRIMARY KEY (layer_name, layer_position)
) WITH CLUSTERING ORDER BY (layer_position DESC)

I use the below query to fetch data from the layer table in descending order(layer):

$select = new Cassandra\SimpleStatement(<<<EOD
                        select * from layer ORDER BY layer_position DESC
EOD
                      ); 

$result = $session->execute($select);

But this query is not working. Please can anyone help me?

解决方案

Simply put, Cassandra only enforces sort order within a partition key.

PRIMARY KEY (layer_name, layer_position)
) WITH CLUSTERING ORDER BY (layer_position DESC)

In this case, layer_name is your partition key. If you specify layer_name in your WHERE clause, your results for that value of layer_name will be ordered by layer_position.

SELECT * FROM layer WHERE layer_name = 'layer1';

You don't need to specify ORDER BY. All ORDER BY really can do at the query level is apply a different sort direction (ascending vs. descending).

Cassandra works this way, because it is designed to read data in whatever order it is sorted on disk. Your partition keys are sorted by hashed token value, which is why results from an unbound WHERE clause appear to be ordered randomly.

EDIT

I have to fetch data using state_id column and it should be order by layer_position.

Cassandra tables are optimized for a specific query. While this results in high performance, the drawback is that query flexibility is limited. The way to solve for this, is to duplicate your data into an additional table designed to serve that particular query.

CREATE TABLE layer_by_state_id (
    layer_name text,
    layer_position text,
    state_id text,
    PRIMARY KEY (state_id, layer_position, layer_name)
) WITH CLUSTERING ORDER BY (layer_position DESC, layer_name ASC);

This table will allow queries like this to work:

SELECT * FROM layer WHERE state_id='thx1138';

And the results will be sorted by layer_position, within the requested state_id.

Now I am making a couple of assumptions that you will want to investigate:

  • I am assuming that state_id is a good partitioning key. Meaning that it has high-enough cardinality to offer good distribution in the cluster, but low-enough cardinality that it returns enough CQL rows to make sorting worthwhile.
  • I am assuming that the combination of state_id and layer_position is not enough to uniquely identify each row. Therefore I am ensuring uniqueness by adding layer_name as an additional clustering key. You may or may not need this, but I'm guessing that you will.
  • I am assuming that using state_id as a partitioning key will not exhibit unbound growth so as to approach Cassandra's limit of 2 billion cells per partition. If that is the case, you may need to add an additional partition "bucket."

这篇关于order by子句在Cassandra查询中不起作用的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆