按 Cassandra 中的任何字段排序 [英] Order By any field in Cassandra

查看:16
本文介绍了按 Cassandra 中的任何字段排序的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在研究将 cassandra 作为我即将开展的项目的可能解决方案.我研究得越多,我越听到有人说对创建表时未设置排序的字段进行排序是一个坏主意.

I am researching cassandra as a possible solution for my up coming project. The more I research the more I keep hearing that it is a bad idea to sort on fields that is not setup for sorting when the table was created.

是否可以对任何字段进行排序?如果对不在集群中的字段进行排序有性能影响,那么性能影响是什么?我需要对表中大约 200 万条记录进行排序.

Is it possible to sort on any field? If there is a performance impact for sorting on fields not in the cluster what is that performance impact? I need to sort around or about 2 million records in the table.

推荐答案

我一直听说对创建表时未设置排序的字段进行排序是一个坏主意.

I keep hearing that it is a bad idea to sort on fields that is not setup for sorting when the table was created.

并不是说这是一个坏主意.真的不可能让 Cassandra 按任意列对您的数据进行排序.Cassandra 需要一种基于查询的建模方法,这也适用于排序顺序.您必须提前决定您希望 Cassandra 支持的查询类型,以及这些查询返回数据的顺序.

It's not so much that it's a bad idea. It's just really not possible to make Cassandra sort your data by an arbitrary column. Cassandra requires a query-based modeling approach, and that goes for sort order as well. You have to decide ahead of time the kinds of queries you want Cassandra to support, and the order in which those queries return their data.

是否可以对任何字段进行排序?

Is it possible to sort on any field?

这是 Cassandra 如何对结果集进行排序的问题:它没有.Cassandra 查询对应于分区位置,数据从磁盘读取并返回给您.如果读取数据的顺序与它在磁盘上的排序顺序相同,则结果集将被排序.另一方面,如果您尝试多键查询或基于索引的查询,它必须跳转到不同的分区,则很可能不会以任何有意义的顺序返回.

Here's the thing with how Cassandra sorts result sets: it doesn't. Cassandra queries correspond to partition locations, and the data is read off of the disk and returned to you. If the data is read in the same order that it was sorted in on-disk, the result set will be sorted. On the other hand if you try a multi-key query or an index-based query where it has to jump around to different partitions, chances are that it will not be returned in any meaningful order.

但如果您提前计划,您实际上可以影响数据的磁盘排序顺序,然后在查询中利用该顺序.这可以通过称为聚类列"的建模机制来完成.Cassandra 允许您指定多个聚类列,但它们仅在单个分区内有效.

But if you plan ahead, you can actually influence the on-disk sort order of your data, and then leverage that order in your queries. This can be done with a modeling mechanism called a "clustering column." Cassandra will allow you to specify multiple clustering columns, but they are only valid within a single partition.

那是什么意思?从 DataStax 文档中获取这个示例.

So what does that mean? Take this example from the DataStax documentation.

CREATE TABLE playlists (
  id uuid,
  artist text,
  album text,
  title text,
  song_order int,
  song_id uuid,
  PRIMARY KEY ((id),song_order))
WITH CLUSTERING ORDER BY (song_order ASC);

使用此表定义,我可以通过 id(分区键)查询特定的 playlist.在每个id内,返回的数据将按照song_order的顺序返回:

With this table definition, I can query a particular playlist by id (the partition key). Within each id, the data will be returned ordered by song_order:

SELECT id, song_order, album, artist, title 
FROM playlists WHERE id = 62c36092-82a1-3a00-93d1-46196ee77204
ORDER BY song_order DESC;

id                                   | song_order | album                 | artist         | title
------------------------------------------------------------------------------------------------------------------
62c36092-82a1-3a00-93d1-46196ee77204 | 4          | No One Rides For Free |      Fu Manchu |             Ojo Rojo    
62c36092-82a1-3a00-93d1-46196ee77204 | 3          |             Roll Away | Back Door Slam |  Outside Woman Blues
62c36092-82a1-3a00-93d1-46196ee77204 | 2          |          We Must Obey |      Fu Manchu |     Moving in Stereo
62c36092-82a1-3a00-93d1-46196ee77204 | 1          |          Tres Hombres |         ZZ Top |            La Grange

在这个例子中,如果我想切换排序方向,我只需要指定一个ORDER BY.由于行以 ASC 结束顺序存储,我需要指定 DESC 以在 DESC 结束顺序中查看它们.如果我可以按 ASC 结束顺序取回行,我根本不需要指定 ORDER BY.

In this example, if I only need to specify an ORDER BY if I want to switch the sort direction. As the rows are stored in ASCending order, I need to specify DESC to see them in DESCending order. If I was fine with getting the rows back in ASCending order, I don't need to specify ORDER BY at all.

但是如果我想按艺术家订购怎么办?还是专辑?或两者?由于一位艺术家可以拥有多张专辑(在本例中),我们将像这样修改 PRIMARY KEY 定义:

But what if I want to order by artist? Or album? Or both? Since one artist can have many albums (for this example), we'll modify the PRIMARY KEY definition like this:

PRIMARY KEY ((id),artist,album,song_order)

运行上述相同的查询(减去 ORDER BY)会产生以下输出:

Running the same query above (minus the ORDER BY) produces this output:

SELECT id, song_order, album, artist, title 
FROM playlists WHERE id = 62c36092-82a1-3a00-93d1-46196ee77204;

id                                   | song_order | album                 | artist         | title
------------------------------------------------------------------------------------------------------------------
62c36092-82a1-3a00-93d1-46196ee77204 | 3          |             Roll Away | Back Door Slam |  Outside Woman Blues
62c36092-82a1-3a00-93d1-46196ee77204 | 4          | No One Rides For Free |      Fu Manchu |             Ojo Rojo    
62c36092-82a1-3a00-93d1-46196ee77204 | 2          |          We Must Obey |      Fu Manchu |     Moving in Stereo
62c36092-82a1-3a00-93d1-46196ee77204 | 1          |          Tres Hombres |         ZZ Top |            La Grange

请注意,这些行现在按 artist 排序,然后是 album.如果我们有来自同一张专辑的两首歌,那么 song_order 将是下一个.

Notice that the rows are now ordered by artist, and then album. If we had two songs from the same album, then song_order would be next.

所以现在您可能会问如果我只想按 album 而不是 artist 排序怎么办?"您可以仅按 album 排序,但不能按此表排序.您不能在 ORDER BY 子句中跳过集群键.为了仅按 album(而不是 artist)排序,您需要设计一个不同的查询表.有时,Cassandra 数据建模会让您将数据复制几次,以便能够处理不同的查询……没关系.

So now you might ask "what if I just want to sort by album, and not artist?" You can sort just by album, but not with this table. You cannot skip clustering keys in your ORDER BY clause. In order to sort only by album (and not artist) you'll need to design a different query table. Sometimes Cassandra data modeling will have you duplicating your data a few times, to be able to serve different queries...and that's ok.

有关如何在利用聚类顺序的同时构建数据模型的更多详细信息,请查看 PlanetCassandra 上的这两篇文章:

For more detail on how to build data models while leveraging clustering order, check out these two articles on PlanetCassandra:

  • Getting Started With Time Series Data Modeling - Patrick McFadin
  • We Shall Have Order! - Disclaimer - I am the author

这篇关于按 Cassandra 中的任何字段排序的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆