按Cassandra中的任何字段排序 [英] Order By any field in Cassandra

查看:255
本文介绍了按Cassandra中的任何字段排序的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在研究cassandra,这是我即将进行的项目的可能解决方案.研究的越多,我越会听到,对在创建表时未设置排序的字段进行排序是一个坏主意.

I am researching cassandra as a possible solution for my up coming project. The more I research the more I keep hearing that it is a bad idea to sort on fields that is not setup for sorting when the table was created.

是否可以在任何字段上排序?如果对不在群集中的字段进行排序会对性能产生影响,那么对性能的影响是什么?我需要对表中的大约200万条记录进行排序.

Is it possible to sort on any field? If there is a performance impact for sorting on fields not in the cluster what is that performance impact? I need to sort around or about 2 million records in the table.

推荐答案

我一直听到,对在创建表时未设置用于排序的字段进行排序是个坏主意.

I keep hearing that it is a bad idea to sort on fields that is not setup for sorting when the table was created.

并不是很多,但这不是一个好主意.真的不可能让Cassandra按任意列对数据进行排序. Cassandra需要一种基于查询的建模方法,这也适用于排序顺序.您必须提前确定 您希望Cassandra支持的查询类型以及这些查询返回其数据的顺序.

It's not so much that it's a bad idea. It's just really not possible to make Cassandra sort your data by an arbitrary column. Cassandra requires a query-based modeling approach, and that goes for sort order as well. You have to decide ahead of time the kinds of queries you want Cassandra to support, and the order in which those queries return their data.

是否可以在任何字段上排序?

Is it possible to sort on any field?

这是Cassandra如何对结果集进行排序的内容:事实并非如此. Cassandra查询对应于分区位置,并且数据将从磁盘中读取并返回给您.如果读取数据的顺序与在磁盘上排序的顺序相同,则将对结果集进行排序.另一方面,如果您尝试在多键查询或基于索引的查询中跳到不同的分区,则很可能不会以任何有意义的顺序返回它.

Here's the thing with how Cassandra sorts result sets: it doesn't. Cassandra queries correspond to partition locations, and the data is read off of the disk and returned to you. If the data is read in the same order that it was sorted in on-disk, the result set will be sorted. On the other hand if you try a multi-key query or an index-based query where it has to jump around to different partitions, chances are that it will not be returned in any meaningful order.

但是,如果您提前计划,则实际上可以影响数据在磁盘上的排序顺序,然后在查询中利用该顺序.这可以通过称为集群列"的建模机制来完成. Cassandra将允许您指定多个群集列,但它们仅在单个分区内有效.

But if you plan ahead, you can actually influence the on-disk sort order of your data, and then leverage that order in your queries. This can be done with a modeling mechanism called a "clustering column." Cassandra will allow you to specify multiple clustering columns, but they are only valid within a single partition.

那是什么意思?从DataStax文档中获取此示例.

So what does that mean? Take this example from the DataStax documentation.

CREATE TABLE playlists (
  id uuid,
  artist text,
  album text,
  title text,
  song_order int,
  song_id uuid,
  PRIMARY KEY ((id),song_order))
WITH CLUSTERING ORDER BY (song_order ASC);

使用此表定义,我可以通过id(分区键)查询特定的playlist.在每个id中,数据将按song_order的顺序返回:

With this table definition, I can query a particular playlist by id (the partition key). Within each id, the data will be returned ordered by song_order:

SELECT id, song_order, album, artist, title 
FROM playlists WHERE id = 62c36092-82a1-3a00-93d1-46196ee77204
ORDER BY song_order DESC;

id                                   | song_order | album                 | artist         | title
------------------------------------------------------------------------------------------------------------------
62c36092-82a1-3a00-93d1-46196ee77204 | 4          | No One Rides For Free |      Fu Manchu |             Ojo Rojo    
62c36092-82a1-3a00-93d1-46196ee77204 | 3          |             Roll Away | Back Door Slam |  Outside Woman Blues
62c36092-82a1-3a00-93d1-46196ee77204 | 2          |          We Must Obey |      Fu Manchu |     Moving in Stereo
62c36092-82a1-3a00-93d1-46196ee77204 | 1          |          Tres Hombres |         ZZ Top |            La Grange

在此示例中,如果我只想指定ORDER BY如果要切换排序方向.由于行以ASC结束顺序存储,因此我需要指定DESC才能以DESC结束顺序查看它们.如果我可以按ASC结束顺序返回行,则完全不需要指定ORDER BY.

In this example, if I only need to specify an ORDER BY if I want to switch the sort direction. As the rows are stored in ASCending order, I need to specify DESC to see them in DESCending order. If I was fine with getting the rows back in ASCending order, I don't need to specify ORDER BY at all.

但是,如果我想按艺术家订购怎么办?还是专辑?或两者?由于一个艺术家可以拥有很多专辑(在此示例中),因此我们将像这样修改PRIMARY KEY定义:

But what if I want to order by artist? Or album? Or both? Since one artist can have many albums (for this example), we'll modify the PRIMARY KEY definition like this:

PRIMARY KEY ((id),artist,album,song_order)

运行上面的相同查询(减去ORDER BY)会产生以下输出:

Running the same query above (minus the ORDER BY) produces this output:

SELECT id, song_order, album, artist, title 
FROM playlists WHERE id = 62c36092-82a1-3a00-93d1-46196ee77204;

id                                   | song_order | album                 | artist         | title
------------------------------------------------------------------------------------------------------------------
62c36092-82a1-3a00-93d1-46196ee77204 | 3          |             Roll Away | Back Door Slam |  Outside Woman Blues
62c36092-82a1-3a00-93d1-46196ee77204 | 4          | No One Rides For Free |      Fu Manchu |             Ojo Rojo    
62c36092-82a1-3a00-93d1-46196ee77204 | 2          |          We Must Obey |      Fu Manchu |     Moving in Stereo
62c36092-82a1-3a00-93d1-46196ee77204 | 1          |          Tres Hombres |         ZZ Top |            La Grange

请注意,现在按artist然后是album对行进行排序.如果我们有同一张专辑中的两首歌曲,那么下一个将是song_order.

Notice that the rows are now ordered by artist, and then album. If we had two songs from the same album, then song_order would be next.

所以现在您可能会问:我是否只想按album而不是artist进行排序?"您只能按album进行排序,但不能与此表进行排序.您不能跳过ORDER BY子句中的群集键.为了仅按album(而不是artist)进行排序,您需要设计其他查询表.有时,Cassandra数据建模会让您重复几次数据,以便能够提供不同的查询... 没关系.

So now you might ask "what if I just want to sort by album, and not artist?" You can sort just by album, but not with this table. You cannot skip clustering keys in your ORDER BY clause. In order to sort only by album (and not artist) you'll need to design a different query table. Sometimes Cassandra data modeling will have you duplicating your data a few times, to be able to serve different queries...and that's ok.

有关如何在利用聚类顺序的同时构建数据模型的更多详细信息,请查看 PlanetCassandra 上的这两篇文章:

For more detail on how to build data models while leveraging clustering order, check out these two articles on PlanetCassandra:

  • Getting Started With Time Series Data Modeling - Patrick McFadin
  • We Shall Have Order! - Disclaimer - I am the author

这篇关于按Cassandra中的任何字段排序的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆