Cassandra数据建模:时间戳作为分区键 [英] Cassandra Data modelling : Timestamp as partition keys

查看:119
本文介绍了Cassandra数据建模:时间戳作为分区键的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我需要能够返回在指定间隔内执行操作的所有用户。 Cassandra中的表定义如下:

I need to be able to return all users that performed an action during a specified interval. The table definition in Cassandra is just below:

create table t ( timestamp from, timestamp to, user text, PRIMARY KEY((from,to), user))

我正在尝试在Cassandra中实现以下查询:

I'm trying to implement the following query in Cassandra:

select * from t WHERE from > :startInterval and to < :toInterval

但是,此查询显然不起作用,因为它表示对分区键的范围查询,

However, this query will obviously not work because it represents a range query on the partition key, forcing Cassandra to search all nodes in the cluster, defeating its purpose as an efficient database.

在Cassandra中可以有效地为该查询建模吗?

Is there an efficient to model this query in Cassandra?

我的解决方案是将两个时间戳分别划分为相应的年份和月份,并将它们用作分区键。该表如下所示:

My solution would be to split both timestamps into their corresponding years and months and use those as the partition key. The table would look like this:

 create table t_updated ( yearFrom int, monthFrom int,yearTo int,monthTo int, timestamp from, timestamp to, user text, PRIMARY KEY((yearFrom,monthFrom,yearTo,monthTo), user) )

如果我希望在2017年1月至2017年7月之间执行操作的用户如下所示:

If i wanted the users that performed the action between Jan 2017 and July 2017 the query would look like the following:

select user from t_updated where yearFrom IN (2017) and monthFrom IN (1,2,3,4,5,6,7) and yearTo IN (2017) and  monthTo IN (1,2,3,4,5,6,7)

是否会有更好的方法在Cassandra中为该查询建模?您将如何处理此问题?

Would there be a better way to model this query in Cassandra? How would you approach this issue?

推荐答案

首先,分区键​​必须对equals运算符进行操作。最好在这里使用PRIMARY KEY(BUCKET,TIME_STAMP),其中Bucket可以是年,月(或包括天,小时等,取决于数据集的大小)的组合。

First, the partition key has to operate on equals operator. It is better to use PRIMARY KEY (BUCKET, TIME_STAMP) here where bucket can be combination of year, month (or include days, hrs etc depending on how big your data set is).

最好执行多个查询并将结果合并到客户端。

It is better to execute multiple queries and combine the result in client side.

这篇关于Cassandra数据建模:时间戳作为分区键的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆