查询结果未排序,尽管WITH CLUSTERING ORDER BY [英] Query results not ordered despite WITH CLUSTERING ORDER BY

查看:1072
本文介绍了查询结果未排序,尽管WITH CLUSTERING ORDER BY的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在表格中存储所有用户的帖子。我想从用户关注的所有用户检索帖子。

I am storing posts from all users in table. I want to retrieve post from all users the user is following.

CREATE TABLE posts (
  userid int,
  time timestamp,
  id uuid,
  content text,
  PRIMARY KEY (userid, time)
)WITH CLUSTERING ORDER BY (time DESC)

我有关于所有用户在另一个表中跟随的数据

I have the data about who all user follows in another table

CREATE TABLE follow (
  userid int,
  who_follow_me set<int>,
  who_i_follow set<int>,
  PRIMARY KEY ((userid))
) 

我正在进行查询

select * from posts where userid in(1,2,3,4....n);

2个问题:


  1. 为什么我仍然以随机顺序获取数据,虽然CLUSTERING ORDER BY在帖子中指定。 ?

  2. 模型是否正确以满足查询(用户可以有n个关注者)?

我使用Cassandra 2.0.10。

I am using Cassandra 2.0.10.

推荐答案

为什么我仍然以随机顺序获取数据,虽然CLUSTERING ORDER BY

"why I still get data in random order, though CLUSTERING ORDER BY is specified in posts?"

这是因为 ORDER BY 仅适用于特定分区键中的行。所以在你的情况下,如果你想查看特定用户的所有帖子,像这样:

This is because ORDER BY only works for rows within a particular partitioning key. So in your case, if you wanted to see all of the posts for a specific user like this:

SELECT * FROM posts WHERE userid=1;

返回时间因为 userid = 1 分区键中的所有行都将被其聚集。

That return your results ordered by time, as all of the rows within the userid=1 partitioning key would be clustered by it.

模型正确以满足查询的最佳方式(用户可以有n个追随者)?

"Is model correct to satisfy the query optimally (user can have n number of followers)?"

它会工作,只要你不在乎时间戳。为了能够查询按时间排序的所有用户的帖子,您需要提供一个不同的分区键。如果不了解你的应用程序,你可以使用 GROUP (例如)和分区。

It will work, as long as you don't care about getting the results ordered by timestamp. To be able to query posts for all users ordered by time, you would need to come up with a different partitioning key. Without knowing too much about your application, you could use a column like GROUP (for instance) and partition on that.

因此,我们假设您将所有用户均匀分配到八个组:A,B,C,D,E,F,G和H.假设您的表设计更改如下:

So let's say that you evenly assign all of your users to eight groups: A, B, C, D, E, F, G and H. Let's say your table design changed like this:

CREATE TABLE posts (
  group text,
  userid int,
  time timestamp,
  id uuid,
  content text,
  PRIMARY KEY (group, time, userid)
)WITH CLUSTERING ORDER BY (time DESC)

然后,您可以查询群组B的所有用户的所有帖子,如下所示:

You could then query all posts for all users for group B like this:

SELECT * FROM posts WHERE group='B';

这将给你B组中所有用户的所有帖子,按时间排序。因此,基本上,对于您的查询,按时间顺序对帖子进行排序,您需要将用户ID以外的用户标识分区。

That would give you all of the posts for all of the users in group B, ordered by time. So basically, for your query to order the posts appropriately by time, you need to partition your post data on something other than userid.

EDIT

PRIMARY KEY (userid, follows)) WITH CLUSTERING ORDER BY (created DESC);

这不行。事实上,这应该产生以下错误:

That's not going to work. In fact, that should produce the following error:


code = 2200 [无效的查询] message =列缺少CLUSTERING ORDER / p>

code=2200 [Invalid query] message="Missing CLUSTERING ORDER for column follows"

即使您在后添加 CLUSTERING ORDER 子句,您会看到:

And even if you did add follows to your CLUSTERING ORDER clause, you would see this:


code = 2200 [无效的查询] message =键列可以在CLUSTERING ORDER指令中定义

code=2200 [Invalid query] message="Only clustering key columns can be defined in CLUSTERING ORDER directive"

CLUSTERING ORDER 只能在聚集列上使用,在这种情况下,只有后面列。将您的PRIMARY KEY定义更改为后面(ASC)和创建(DESC)。我测试了这个,并插入一些示例数据,可以看到这个查询工作原理:

The CLUSTERING ORDER clause can only be used on the clustering column(s), which in this case, is only the follows column. Alter your PRIMARY KEY definition to cluster on follows (ASC) and created (DESC). I have tested this, and inserted some sample data, and can see that this query works:

aploetz@cqlsh:stackoverflow> SELECT * FROM posts WHERE userid=2 AND follows=1;

 userid | follows | created                  | id
--------+---------+--------------------------+--------------------------------------
      2 |       1 | 2015-01-25 13:27:00-0600 | 559cda12-8fe7-45d3-9a61-7ddd2119fcda
      2 |       1 | 2015-01-25 13:26:00-0600 | 64b390ba-a323-4c71-baa8-e247a8bc9cdf
      2 |       1 | 2015-01-25 13:24:00-0600 | 1b325b66-8ae5-4a2e-a33d-ee9b5ad464b4

(3 rows)

虽然,如果你想查询 userid ,你可以看到所有的追随者的帖子。但是在这种情况下,帖子只会在每个关注者内订购,如下所示:

Although, if you want to query by just userid you can see posts from all of your followers. But in that case, the posts will only be ordered within each followerid, like this:

aploetz@cqlsh:stackoverflow> SELECT * FROM posts WHERE userid=2;

 userid | follows | created                  | id
--------+---------+--------------------------+--------------------------------------
      2 |       0 | 2015-01-25 13:28:00-0600 | 94da27d0-e91f-4c1f-88f2-5a4bbc4a0096
      2 |       0 | 2015-01-25 13:23:00-0600 | 798053d3-f1c4-4c1d-a79d-d0faff10a5fb
      2 |       1 | 2015-01-25 13:27:00-0600 | 559cda12-8fe7-45d3-9a61-7ddd2119fcda
      2 |       1 | 2015-01-25 13:26:00-0600 | 64b390ba-a323-4c71-baa8-e247a8bc9cdf
      2 |       1 | 2015-01-25 13:24:00-0600 | 1b325b66-8ae5-4a2e-a33d-ee9b5ad464b4

(5 rows)

这篇关于查询结果未排序,尽管WITH CLUSTERING ORDER BY的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆