聚类键范围查询 [英] Range query on clustering key

查看:48
本文介绍了聚类键范围查询的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个表,用于记录在我的网站上执行的用户活动.我的表结构如下:

I have a table where I am logging user activity performed on my website. My table structure looks like:

CREATE TABLE key_space.log (
    id uuid,
    time bigint,
    ip text,
    url text,
    user_id int,
    PRIMARY KEY (id, time)
) WITH CLUSTERING ORDER BY (time DESC)

现在,我想获取最近5分钟内出现的所有记录.

Now I want to fetch all the records which came in last 5 minutes.

为此,我正在使用

select * from key_space.log where 
  time>current_timestamp - 5 minutes ALLOW FILTERING;

但是此查询未返回任何结果&我收到timedoutexception错误.如何解决这个问题呢?在这方面的任何帮助将不胜感激.

But this query is not returning any result & i am getting timedoutexception error. How to solve this problem? Any help on this would be really appreciated.

推荐答案

与所有Cassandra模型一样,您需要首先构建专门用于支持该查询的表 .即使您可以使其与当前表一起使用,它也必须扫描群集中的每个节点,这可能会超时(如您所见).

As with all Cassandra models, you'll need to start by building a table specifically designed to support that query. Even if you could make it work with your current table, it would have to scan every node in the cluster, which would probably time-out (as you are seeing).

执行此操作的一种方法是使用时间存储桶"作为分区键.如果您只关心最后5分钟的记录,那么只要您每天没有数百万的新记录,那么"day"应该有效.如果这样做,那么您需要为存储桶"使用更小的时间分量.

One way to do this will be to use a time "bucket" as a partition key. If you just care about records for the last 5 minutes, then "day" should work, as long as you don't get millions of new records per day. If you do, then you'll need a smaller time component for your "bucket."

CREATE TABLE log_by_day (
    id uuid,
    day bigint,
    time bigint,
    ip text,
    url text,
    user_id int,
    PRIMARY KEY (day, time, id)
) WITH CLUSTERING ORDER BY (time DESC, id ASC);

现在这样的查询将起作用:

Now a query like this will work:

aaron@cqlsh:stackoverflow> SELECT day,time,id,user_id FROM log_by_day
  WHERE day=201920 AND time > 1563635871941;

 day    | time          | id                                   | user_id
--------+---------------+--------------------------------------+---------
 201920 | 1563635872259 | 7fef03da-6c23-4bf2-9e98-fd126ab17944 |    1234
 201920 | 1563635872259 | 9a0c49ce-5ad2-45c5-b570-cd9de1c060d1 |    4607
 201920 | 1563635872209 | 9227166e-cda2-4909-b8ac-4168922a0128 |    2112

(3 rows)

提示:对唯一的 id 进行分区非常适合用于数据分发,但在查询灵活性方面却没有提供太多帮助.

Pro-tip: Partitioning on a unique id is great for data distribution, but doesn't give you much in the way of query flexibility.

这篇关于聚类键范围查询的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆